[RFC PATCH 0/2] x86: Fix missing core serialization on migration

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-10 21:12 Mathieu Desnoyers
       [not found] ` <20171110211249.10742-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-10 21:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Andrea Parri, Russell King, Greg Hackmann,
	Will Deacon, David Sehr, Linus

x86 can return to user-space through sysexit and sysretq, which are not
core serializing. This breaks expectations from user-space about
sequential consistency from a single-threaded self-modifying program
point of view in specific migration patterns.

Feedback is welcome,

Thanks,

Mathieu

Mathieu Desnoyers (2):
  x86: Introduce sync_core_before_usermode
  Fix: x86: Add missing core serializing instruction on migration

 arch/x86/Kconfig                 |  1 +
 arch/x86/include/asm/processor.h | 10 ++++++++++
 include/linux/processor.h        |  6 ++++++
 kernel/sched/core.c              |  7 +++++++
 kernel/sched/sched.h             |  1 +
 5 files changed, 25 insertions(+)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20171110211249.10742-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>]

* [RFC PATCH 1/2] x86: Introduce sync_core_before_usermode
       [not found] ` <20171110211249.10742-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2017-11-10 21:12   ` Mathieu Desnoyers
  2017-11-10 21:12   ` [RFC PATCH 2/2] Fix: x86: Add missing core serializing instruction on migration Mathieu Desnoyers
  2017-11-10 21:36   ` [RFC PATCH 0/2] x86: Fix missing core serialization " Linus Torvalds
  2 siblings, 0 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-10 21:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Andrea Parri, Russell King, Greg Hackmann,
	Will Deacon, David Sehr, Linus

Introduce an architecture function that ensures the current CPU
issues a core serializing instruction before returning to usermode.

This is needed to fix an existing core serialization bug on
thread migration, and also needed by the membarrier "sync_core" command.

Architectures defining the sync_core_before_usermode() static inline
need to define ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Maged Michael <maged.michael-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Avi Kivity <avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Andrea Parri <parri.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Russell King <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org>
CC: Greg Hackmann <ghackmann-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: David Sehr <sehr-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
CC: linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/x86/Kconfig                 |  1 +
 arch/x86/include/asm/processor.h | 10 ++++++++++
 include/linux/processor.h        |  6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 01f78c1d40b5..54fbb8960d94 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -62,6 +62,7 @@ config X86
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
+	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAS_ZONE_DEVICE		if X86_64
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index bdac19ab2488..6ce996a7c730 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -706,6 +706,16 @@ static inline void sync_core(void)
 #endif
 }
 
+/*
+ * Ensure that a core serializing instruction is issued before returning
+ * to user-mode. x86 implements return to user-space through sysexit and
+ * sysretq, which are not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+	sync_core();
+}
+
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 extern void amd_e400_c1e_apic_setup(void);
 
diff --git a/include/linux/processor.h b/include/linux/processor.h
index dbc952eec869..7d12e6fa050e 100644
--- a/include/linux/processor.h
+++ b/include/linux/processor.h
@@ -68,4 +68,10 @@ do {								\
 
 #endif
 
+#ifndef ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
+static inline sync_core_before_usermode(void)
+{
+}
+#endif
+
 #endif /* _LINUX_PROCESSOR_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/2] Fix: x86: Add missing core serializing instruction on migration
       [not found] ` <20171110211249.10742-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2017-11-10 21:12   ` [RFC PATCH 1/2] x86: Introduce sync_core_before_usermode Mathieu Desnoyers
@ 2017-11-10 21:12   ` Mathieu Desnoyers
  2017-11-10 21:36   ` [RFC PATCH 0/2] x86: Fix missing core serialization " Linus Torvalds
  2 siblings, 0 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-10 21:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Andrea Parri, Russell King, Greg Hackmann,
	Will Deacon, David Sehr, Linus

x86 has a missing core serializing instruction in migration scenarios.

Given that x86-32 can return to user-space with sysexit, and x86-64
through sysretq and sysexit, which are not core serializing, the
following user-space self-modifiying code (JIT) scenario can occur:

     CPU 0                      CPU 1

User-space self-modify code
Preempted
 migrated             ->
                                scheduler selects task
                                Return to user-space (iret or sysexit)
                                User-space issues sync_core()
                      <-        migrated
scheduler selects task
Return to user-space (sysexit)
jump to modified code
Run modified code without sync_core() -> bug.

This migration pattern can return to user-space through sysexit or
sysret64, which is not core serializing, and therefore breaks sequential
consistency expectations from a single-threaded process.

Fix this issue by invoking sync_core_before_usermode() the first
time a runqueue finishes a task switch after receiving a migrated
thread.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Maged Michael <maged.michael-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Avi Kivity <avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Andrea Parri <parri.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Russell King <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org>
CC: Greg Hackmann <ghackmann-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: David Sehr <sehr-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
CC: linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 kernel/sched/core.c  | 7 +++++++
 kernel/sched/sched.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c79e94278613..4a1c9782267a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -927,6 +927,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
 
 	rq_lock(rq, rf);
 	BUG_ON(task_cpu(p) != new_cpu);
+	rq->need_sync_core = 1;
 	enqueue_task(rq, p, 0);
 	p->on_rq = TASK_ON_RQ_QUEUED;
 	check_preempt_curr(rq, p, 0);
@@ -2684,6 +2685,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
+#ifdef CONFIG_SMP
+	if (unlikely(rq->need_sync_core)) {
+		sync_core_before_usermode();
+		rq->need_sync_core = 0;
+	}
+#endif
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cab256c1720a..33e617bc491c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -734,6 +734,7 @@ struct rq {
 	/* For active balancing */
 	int active_balance;
 	int push_cpu;
+	int need_sync_core;
 	struct cpu_stop_work active_balance_work;
 	/* cpu of this runqueue: */
 	int cpu;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found] ` <20171110211249.10742-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2017-11-10 21:12   ` [RFC PATCH 1/2] x86: Introduce sync_core_before_usermode Mathieu Desnoyers
  2017-11-10 21:12   ` [RFC PATCH 2/2] Fix: x86: Add missing core serializing instruction on migration Mathieu Desnoyers
@ 2017-11-10 21:36   ` Linus Torvalds
       [not found]     ` <CA+55aFzbroWqi+FTdYhRVSwUZ-M0wDVxjXqDbh40JEnXc2LdgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2017-11-10 21:36 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andy Lutomirski, Linux Kernel Mailing List, Linux API,
	Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andrew Hunter,
	Maged Michael, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Andrea Parri, Russell King, Greg Hackmann, Will

On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
> x86 can return to user-space through sysexit and sysretq, which are not
> core serializing. This breaks expectations from user-space about
> sequential consistency from a single-threaded self-modifying program
> point of view in specific migration patterns.
>
> Feedback is welcome,

We should check with Intel. I would actually be surprised if the I$
can be out of sync with the D$ after a sysretq.  It would actually
break things like "read code from disk" too in theory.

Hpa? Can you check?

              Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <CA+55aFzbroWqi+FTdYhRVSwUZ-M0wDVxjXqDbh40JEnXc2LdgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]     ` <CA+55aFzbroWqi+FTdYhRVSwUZ-M0wDVxjXqDbh40JEnXc2LdgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-10 21:57       ` Mathieu Desnoyers
       [not found]         ` <885227610.13045.1510351034488.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-10 21:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, linux-kernel, linux-api, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann, Will

----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:

> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>> x86 can return to user-space through sysexit and sysretq, which are not
>> core serializing. This breaks expectations from user-space about
>> sequential consistency from a single-threaded self-modifying program
>> point of view in specific migration patterns.
>>
>> Feedback is welcome,
> 
> We should check with Intel. I would actually be surprised if the I$
> can be out of sync with the D$ after a sysretq.  It would actually
> break things like "read code from disk" too in theory.

That core serializing instruction is not that much about I$ vs D$
consistency, but rather about the processor speculatively executing code
ahead of its retirement point. Ref. Intel Architecture Software Developer's
Manual, Volume 3: System Programming.

7.1.3. "Handling Self- and Cross-Modifying Code":

"The act of a processor writing data into a currently executing code segment with the intent of
executing that data as code is called self-modifying code. Intel Architecture processors exhibit
model-specific behavior when executing self-modified code, depending upon how far ahead of
the current execution pointer the code has been modified. As processor architectures become
more complex and start to speculatively execute code ahead of the retirement point (as in the P6
family processors), the rules regarding which code should execute, pre- or post-modification,
become blurred. [...]"

AFAIU, this core serializing instruction seems to be needed for use-cases of
self-modifying code, but not for the initial load of a program from disk,
as the processor has no way to have speculatively executed any of its
instructions.

Hopefully hpa can tell us more about this,

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <885227610.13045.1510351034488.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]         ` <885227610.13045.1510351034488.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2017-11-10 22:12           ` Linus Torvalds
  2017-11-13 16:56           ` Mathieu Desnoyers
  1 sibling, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2017-11-10 22:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andy Lutomirski, linux-kernel, linux-api, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann, Will

On Fri, Nov 10, 2017 at 1:57 PM, Mathieu Desnoyers
<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>
> That core serializing instruction is not that much about I$ vs D$
> consistency, but rather about the processor speculatively executing code
> ahead of its retirement point. Ref. Intel Architecture Software Developer's
> Manual, Volume 3: System Programming.

Oh, I know.

I'm just saying that the Intel docs wrt cross-modifying code are most
likely crap and overly defensive.

The sequence they _say_ is required can not possibly be required,
simply because people already depend on it not being required. We've
never had the serializing instruction in various other circumstances
when we switched from the old "iret" to "sysret".

I think it's kind of like the old memory ordering: Intel didn't really
document the real rules. They only started truly documenting what they
*really* did about ten years ago.

Remember when we thought you needed a locked instruction or a memory
barrier in between two reads, and our "smp_rmb()" was an actual
barrier instruction?

Yeah, that was always bogus, but it was what the (bad) intel
documentation said you had to do. Then they started fixing their docs,
and now smp_rmb() is just a compiler barrier on x86.

It's about ten years ago that we committed b6c7347fffa6 ("x86:
optimise barriers") as a response to the Intel/AMD memory ordering
whitepaper (which is now part of the standard architecture manual, but
it

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]         ` <885227610.13045.1510351034488.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2017-11-10 22:12           ` Linus Torvalds
@ 2017-11-13 16:56           ` Mathieu Desnoyers
       [not found]             ` <617343212.13932.1510592207202.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-13 16:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, linux-kernel, linux-api, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann, Will

----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:

> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
> wrote:
> 
>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>>> x86 can return to user-space through sysexit and sysretq, which are not
>>> core serializing. This breaks expectations from user-space about
>>> sequential consistency from a single-threaded self-modifying program
>>> point of view in specific migration patterns.
>>>
>>> Feedback is welcome,
>> 
>> We should check with Intel. I would actually be surprised if the I$
>> can be out of sync with the D$ after a sysretq.  It would actually
>> break things like "read code from disk" too in theory.
> 
> That core serializing instruction is not that much about I$ vs D$
> consistency, but rather about the processor speculatively executing code
> ahead of its retirement point. Ref. Intel Architecture Software Developer's
> Manual, Volume 3: System Programming.
> 
> 7.1.3. "Handling Self- and Cross-Modifying Code":
> 
> "The act of a processor writing data into a currently executing code segment
> with the intent of
> executing that data as code is called self-modifying code. Intel Architecture
> processors exhibit
> model-specific behavior when executing self-modified code, depending upon how
> far ahead of
> the current execution pointer the code has been modified. As processor
> architectures become
> more complex and start to speculatively execute code ahead of the retirement
> point (as in the P6
> family processors), the rules regarding which code should execute, pre- or
> post-modification,
> become blurred. [...]"
> 
> AFAIU, this core serializing instruction seems to be needed for use-cases of
> self-modifying code, but not for the initial load of a program from disk,
> as the processor has no way to have speculatively executed any of its
> instructions.

I figured out what you're pointing to: if exec() is executed by a previously
running thread, and there is no core serializing instruction between program
load and return to user-space, the kernel ends up acting like a JIT, indeed.

Therefore, we'd also need to invoke sync_core_before_usermode() after loading
the program.

Let's wait to hear back from hpa,

Thanks,

Mathieu


> 
> Hopefully hpa can tell us more about this,
> 
> Thanks,
> 
> Mathieu
> 
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <617343212.13932.1510592207202.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]             ` <617343212.13932.1510592207202.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2017-11-13 17:14               ` Linus Torvalds
  2017-11-14 14:53               ` Avi Kivity
  1 sibling, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2017-11-13 17:14 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andy Lutomirski, linux-kernel, linux-api, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann, Will

On Mon, Nov 13, 2017 at 8:56 AM, Mathieu Desnoyers
<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>
> I figured out what you're pointing to: if exec() is executed by a previously
> running thread, and there is no core serializing instruction between program
> load and return to user-space, the kernel ends up acting like a JIT, indeed.

Well, exec() is actually the least of our problems, because it will
have caused the virtual m apping to be set up too.

But we have had cases that haven't had that basically forever. Your
example of user-space doing an _unintentional_ cross-modification is
just such a case, but so is anybody doing their own code management in
user space by just reading their own executable into memory etc.

So part of the problem is that it's perfectly valid to generate code
and then just jump to it in x86 space as long as you stay on the same
CPU. And there has never been any guarantee that that you wouldn't be
migrated in between.

In _practice_, I suspect that migration events are much much too big
for this to be an issue at all. And the trigger for migration is going
to be something like a timer interrupt that causes us to reschedule in
the first place - which ends up serializing due to the iret. And even
if the rescheduling is done by one CPU just doing a "schedule()", us
doing a re-balancing of CPU's, and another CPU then picking up the
process, there's been tens of thousands of instructions, several
spinlocks, lots of cross-CPU synchronization etc going on.

I do not believe for a second that the CPU prefetching queue will be
active over those kinds of ranges and events.

So I don't really think the problem can actually occur in the first
place. I think the SDK rules are garbage.

But that's exactly why I'd actually really want to get some more real
rules from Intel and AMD. Because I think your patch is pointless, and
doesn't really fix anything in reality, but it's triggered by reading
the Intel SDK and going "in theory, this means that we would need to
do XYZ".

And when theory and practice do not match, I think (a) the theory is
bad, and (b) reality trumps theory.

In this case (b) means that I'm not super-eager to apply the patch,
and (a) means that since the theory is based on the Intel SDK, I think
 we should consider the Intel SDK to be a problem, and ask for
clarification of just what the rules really are.

            Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]             ` <617343212.13932.1510592207202.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2017-11-13 17:14               ` Linus Torvalds
@ 2017-11-14 14:53               ` Avi Kivity
       [not found]                 ` <4d47fbb8-8f99-19d3-a9cf-66841aeffac3-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2017-11-14 14:53 UTC (permalink / raw)
  To: Mathieu Desnoyers, Linus Torvalds
  Cc: Andy Lutomirski, linux-kernel, linux-api, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann, Will Deacon



On 11/13/2017 06:56 PM, Mathieu Desnoyers wrote:
> ----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:
>
>> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
>> wrote:
>>
>>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>>> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>>>> x86 can return to user-space through sysexit and sysretq, which are not
>>>> core serializing. This breaks expectations from user-space about
>>>> sequential consistency from a single-threaded self-modifying program
>>>> point of view in specific migration patterns.
>>>>
>>>> Feedback is welcome,
>>> We should check with Intel. I would actually be surprised if the I$
>>> can be out of sync with the D$ after a sysretq.  It would actually
>>> break things like "read code from disk" too in theory.
>> That core serializing instruction is not that much about I$ vs D$
>> consistency, but rather about the processor speculatively executing code
>> ahead of its retirement point. Ref. Intel Architecture Software Developer's
>> Manual, Volume 3: System Programming.
>>
>> 7.1.3. "Handling Self- and Cross-Modifying Code":
>>
>> "The act of a processor writing data into a currently executing code segment
>> with the intent of
>> executing that data as code is called self-modifying code. Intel Architecture
>> processors exhibit
>> model-specific behavior when executing self-modified code, depending upon how
>> far ahead of
>> the current execution pointer the code has been modified. As processor
>> architectures become
>> more complex and start to speculatively execute code ahead of the retirement
>> point (as in the P6
>> family processors), the rules regarding which code should execute, pre- or
>> post-modification,
>> become blurred. [...]"
>>
>> AFAIU, this core serializing instruction seems to be needed for use-cases of
>> self-modifying code, but not for the initial load of a program from disk,
>> as the processor has no way to have speculatively executed any of its
>> instructions.
> I figured out what you're pointing to: if exec() is executed by a previously
> running thread, and there is no core serializing instruction between program
> load and return to user-space, the kernel ends up acting like a JIT, indeed.

I think that's safe. The kernel has to execute a MOV CR3 instruction 
before it can execute code loaded by exec, and that is a serializing 
instruction. Loading and unloading shared libraries is made safe by the 
IRET executed by page faults (loading) and TLB shootdown IPIs (unloading).

Directly modifying code in userspace is unsafe if there is some 
non-coherent instruction cache. Instruction fetch and speculative 
execution are non-coherent, but they're probably too short (in current 
processors) to matter. Trace caches are probably large enough, but I 
don't know whether they are coherent or not.


>
> Therefore, we'd also need to invoke sync_core_before_usermode() after loading
> the program.
>
> Let's wait to hear back from hpa,
>
> Thanks,
>
> Mathieu
>
>
>> Hopefully hpa can tell us more about this,
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <4d47fbb8-8f99-19d3-a9cf-66841aeffac3-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                 ` <4d47fbb8-8f99-19d3-a9cf-66841aeffac3-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
@ 2017-11-14 15:17                   ` Mathieu Desnoyers
       [not found]                     ` <4431530.14831.1510672632887.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 15:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
	Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

----- On Nov 14, 2017, at 9:53 AM, Avi Kivity avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org wrote:

> On 11/13/2017 06:56 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers
>> mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:
>>
>>> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
>>> wrote:
>>>
>>>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>>>> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>>>>> x86 can return to user-space through sysexit and sysretq, which are not
>>>>> core serializing. This breaks expectations from user-space about
>>>>> sequential consistency from a single-threaded self-modifying program
>>>>> point of view in specific migration patterns.
>>>>>
>>>>> Feedback is welcome,
>>>> We should check with Intel. I would actually be surprised if the I$
>>>> can be out of sync with the D$ after a sysretq.  It would actually
>>>> break things like "read code from disk" too in theory.
>>> That core serializing instruction is not that much about I$ vs D$
>>> consistency, but rather about the processor speculatively executing code
>>> ahead of its retirement point. Ref. Intel Architecture Software Developer's
>>> Manual, Volume 3: System Programming.
>>>
>>> 7.1.3. "Handling Self- and Cross-Modifying Code":
>>>
>>> "The act of a processor writing data into a currently executing code segment
>>> with the intent of
>>> executing that data as code is called self-modifying code. Intel Architecture
>>> processors exhibit
>>> model-specific behavior when executing self-modified code, depending upon how
>>> far ahead of
>>> the current execution pointer the code has been modified. As processor
>>> architectures become
>>> more complex and start to speculatively execute code ahead of the retirement
>>> point (as in the P6
>>> family processors), the rules regarding which code should execute, pre- or
>>> post-modification,
>>> become blurred. [...]"
>>>
>>> AFAIU, this core serializing instruction seems to be needed for use-cases of
>>> self-modifying code, but not for the initial load of a program from disk,
>>> as the processor has no way to have speculatively executed any of its
>>> instructions.
>> I figured out what you're pointing to: if exec() is executed by a previously
>> running thread, and there is no core serializing instruction between program
>> load and return to user-space, the kernel ends up acting like a JIT, indeed.
> 
> I think that's safe. The kernel has to execute a MOV CR3 instruction
> before it can execute code loaded by exec, and that is a serializing
> instruction. Loading and unloading shared libraries is made safe by the
> IRET executed by page faults (loading) and TLB shootdown IPIs (unloading).

Very good points! Perhaps those guarantees should be documented somewhere ?

> 
> Directly modifying code in userspace is unsafe if there is some
> non-coherent instruction cache. Instruction fetch and speculative
> execution are non-coherent, but they're probably too short (in current
> processors) to matter. Trace caches are probably large enough, but I
> don't know whether they are coherent or not.

Android guys at Google have reproducers of context synchronization issues
on arm 64 in JIT scenarios. Based on the information I got, flushing the
instruction caches is not enough: they also need to issue a context
synchronizing instruction.

Perhaps the current Intel processors may have short enough speculative
execution and small enough trace caches, but relying on this without
a clear statement from Intel seems fragile.

I've tried to create a small single-threaded self-modifying loop in
user-space to trigger a trace cache or speculative execution quirk,
but I have not succeeded yet. I suspect that I would need to know
more about the internals of the processor architecture to create the
right stalls that would allow speculative execution to move further
ahead, and trigger an incoherent execution flow. Ideas on how to
trigger this would be welcome.

Thanks,

Mathieu


> 
> 
>>
>> Therefore, we'd also need to invoke sync_core_before_usermode() after loading
>> the program.
>>
>> Let's wait to hear back from hpa,
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>>> Hopefully hpa can tell us more about this,
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
> >> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <4431530.14831.1510672632887.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                     ` <4431530.14831.1510672632887.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2017-11-14 15:42                       ` Avi Kivity
  2017-11-14 16:05                       ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Avi Kivity @ 2017-11-14 15:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
	Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann



On 11/14/2017 05:17 PM, Mathieu Desnoyers wrote:
> ----- On Nov 14, 2017, at 9:53 AM, Avi Kivity avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org wrote:
>
>> On 11/13/2017 06:56 PM, Mathieu Desnoyers wrote:
>>> ----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers
>>> mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:
>>>
>>>> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
>>>> wrote:
>>>>
>>>>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>>>>> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>>>>>> x86 can return to user-space through sysexit and sysretq, which are not
>>>>>> core serializing. This breaks expectations from user-space about
>>>>>> sequential consistency from a single-threaded self-modifying program
>>>>>> point of view in specific migration patterns.
>>>>>>
>>>>>> Feedback is welcome,
>>>>> We should check with Intel. I would actually be surprised if the I$
>>>>> can be out of sync with the D$ after a sysretq.  It would actually
>>>>> break things like "read code from disk" too in theory.
>>>> That core serializing instruction is not that much about I$ vs D$
>>>> consistency, but rather about the processor speculatively executing code
>>>> ahead of its retirement point. Ref. Intel Architecture Software Developer's
>>>> Manual, Volume 3: System Programming.
>>>>
>>>> 7.1.3. "Handling Self- and Cross-Modifying Code":
>>>>
>>>> "The act of a processor writing data into a currently executing code segment
>>>> with the intent of
>>>> executing that data as code is called self-modifying code. Intel Architecture
>>>> processors exhibit
>>>> model-specific behavior when executing self-modified code, depending upon how
>>>> far ahead of
>>>> the current execution pointer the code has been modified. As processor
>>>> architectures become
>>>> more complex and start to speculatively execute code ahead of the retirement
>>>> point (as in the P6
>>>> family processors), the rules regarding which code should execute, pre- or
>>>> post-modification,
>>>> become blurred. [...]"
>>>>
>>>> AFAIU, this core serializing instruction seems to be needed for use-cases of
>>>> self-modifying code, but not for the initial load of a program from disk,
>>>> as the processor has no way to have speculatively executed any of its
>>>> instructions.
>>> I figured out what you're pointing to: if exec() is executed by a previously
>>> running thread, and there is no core serializing instruction between program
>>> load and return to user-space, the kernel ends up acting like a JIT, indeed.
>> I think that's safe. The kernel has to execute a MOV CR3 instruction
>> before it can execute code loaded by exec, and that is a serializing
>> instruction. Loading and unloading shared libraries is made safe by the
>> IRET executed by page faults (loading) and TLB shootdown IPIs (unloading).
> Very good points! Perhaps those guarantees should be documented somewhere ?
>
>> Directly modifying code in userspace is unsafe if there is some
>> non-coherent instruction cache. Instruction fetch and speculative
>> execution are non-coherent, but they're probably too short (in current
>> processors) to matter. Trace caches are probably large enough, but I
>> don't know whether they are coherent or not.
> Android guys at Google have reproducers of context synchronization issues
> on arm 64 in JIT scenarios. Based on the information I got, flushing the
> instruction caches is not enough: they also need to issue a context
> synchronizing instruction.
>
> Perhaps the current Intel processors may have short enough speculative
> execution and small enough trace caches, but relying on this without
> a clear statement from Intel seems fragile.

A small trace cache is still vulnerable, the question is whether it is 
coherent or not.

> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.
>
>


Intels resynchronize as soon as you jump (in single-threaded execution), 
so you need to update ahead of the current instruction pointer to see 
something. Not sure what quirk you're interested in seeing, executing 
the old code? That's not very exciting.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                     ` <4431530.14831.1510672632887.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2017-11-14 15:42                       ` Avi Kivity
@ 2017-11-14 16:05                       ` Peter Zijlstra
       [not found]                         ` <20171114160541.GC3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:05 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
	linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.

I thought the whole problem was per definition multi-threaded.

Single-threaded stuff can't get out of sync with itself; you'll always
observe your own stores.

And ISTR the JIT scenario being something like the JIT overwriting
previously executed but supposedly no longer used code. And in this
scenario you'd want to guarantee all CPUs observe the new code before
jumping into it.

The current approach is using mprotect(), except that on a number of
platforms the TLB invalidate from that is not guaranteed to be strong
enough to sync for code changes.

On x86 the mprotect() should work just fine, since we broadcast IPIs for
the TLB invalidate and the IRET from those will get the things synced up
again (if nothing else; very likely we'll have done a MOV-CR3 which will
of course also have sufficient syncness on it).

But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
and don't guarantee their TLB invalidate sync against execution units
are left broken by this scheme.

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20171114160541.GC3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                         ` <20171114160541.GC3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
@ 2017-11-14 16:08                           ` Peter Zijlstra
  2017-11-14 16:49                             ` Mathieu Desnoyers
  2017-11-14 16:10                           ` Andy Lutomirski
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
	linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> > I've tried to create a small single-threaded self-modifying loop in
> > user-space to trigger a trace cache or speculative execution quirk,
> > but I have not succeeded yet. I suspect that I would need to know
> > more about the internals of the processor architecture to create the
> > right stalls that would allow speculative execution to move further
> > ahead, and trigger an incoherent execution flow. Ideas on how to
> > trigger this would be welcome.
> 
> I thought the whole problem was per definition multi-threaded.
> 
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.

And even if you could, you can always execute a local serializing
instruction like CPUID to force things.

> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
> 
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
> 
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
> 
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
  2017-11-14 16:08                           ` Peter Zijlstra
@ 2017-11-14 16:49                             ` Mathieu Desnoyers
  2017-11-14 17:03                               ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 16:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
	linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:

> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> > I've tried to create a small single-threaded self-modifying loop in
>> > user-space to trigger a trace cache or speculative execution quirk,
>> > but I have not succeeded yet. I suspect that I would need to know
>> > more about the internals of the processor architecture to create the
>> > right stalls that would allow speculative execution to move further
>> > ahead, and trigger an incoherent execution flow. Ideas on how to
>> > trigger this would be welcome.
>> 
>> I thought the whole problem was per definition multi-threaded.
>> 
>> Single-threaded stuff can't get out of sync with itself; you'll always
>> observe your own stores.
> 
> And even if you could, you can always execute a local serializing
> instruction like CPUID to force things.

What I'm trying to reproduce is something that breaks in single-threaded
case if I explicitly leave out the CPUID core serializing instruction
when doing code modification on upcoming code, in a loop.

AFAIU, Intel requires a core serializing instruction to be issued even
in single-threaded scenarios between code update and execution, to ensure
that speculative execution does not observe incoherent code. Now the
question we all have for Intel is: is this requirement too strong, or
required by reality ?

Thanks,

Mathieu

> 
>> And ISTR the JIT scenario being something like the JIT overwriting
>> previously executed but supposedly no longer used code. And in this
>> scenario you'd want to guarantee all CPUs observe the new code before
>> jumping into it.
>> 
>> The current approach is using mprotect(), except that on a number of
>> platforms the TLB invalidate from that is not guaranteed to be strong
>> enough to sync for code changes.
>> 
>> On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> the TLB invalidate and the IRET from those will get the things synced up
>> again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> of course also have sufficient syncness on it).
>> 
>> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> and don't guarantee their TLB invalidate sync against execution units
>> are left broken by this scheme.

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
  2017-11-14 16:49                             ` Mathieu Desnoyers
@ 2017-11-14 17:03                               ` Avi Kivity
       [not found]                                 ` <98b50de6-4cb1-9c43-4353-9ee7135dc63f-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2017-11-14 17:03 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann, Will Deacon



On 11/14/2017 06:49 PM, Mathieu Desnoyers wrote:
> ----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz@infradead.org wrote:
>
>> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>>>> I've tried to create a small single-threaded self-modifying loop in
>>>> user-space to trigger a trace cache or speculative execution quirk,
>>>> but I have not succeeded yet. I suspect that I would need to know
>>>> more about the internals of the processor architecture to create the
>>>> right stalls that would allow speculative execution to move further
>>>> ahead, and trigger an incoherent execution flow. Ideas on how to
>>>> trigger this would be welcome.
>>> I thought the whole problem was per definition multi-threaded.
>>>
>>> Single-threaded stuff can't get out of sync with itself; you'll always
>>> observe your own stores.
>> And even if you could, you can always execute a local serializing
>> instruction like CPUID to force things.
> What I'm trying to reproduce is something that breaks in single-threaded
> case if I explicitly leave out the CPUID core serializing instruction
> when doing code modification on upcoming code, in a loop.
>
> AFAIU, Intel requires a core serializing instruction to be issued even
> in single-threaded scenarios between code update and execution, to ensure
> that speculative execution does not observe incoherent code. Now the
> question we all have for Intel is: is this requirement too strong, or
> required by reality ?
>

In single-threaded execution, a jump is enough.

"As processor microarchitectures become more complex and start to 
speculatively execute code ahead of the retire-
ment point (as in P6 and more recent processor families), the rules 
regarding which code should execute, pre- or
post-modification, become blurred. To write self-modifying code and 
ensure that it is compliant with current and
future versions of the IA-32 architectures, use one of the following 
coding options:

(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;"

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <98b50de6-4cb1-9c43-4353-9ee7135dc63f-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                 ` <98b50de6-4cb1-9c43-4353-9ee7135dc63f-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
@ 2017-11-14 17:10                                   ` Mathieu Desnoyers
       [not found]                                     ` <1216732828.15017.1510679404571.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 17:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Linus Torvalds, Andy Lutomirski, linux-kernel,
	linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

----- On Nov 14, 2017, at 12:03 PM, Avi Kivity avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org wrote:

> On 11/14/2017 06:49 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:
>>
>>> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>>>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>>>>> I've tried to create a small single-threaded self-modifying loop in
>>>>> user-space to trigger a trace cache or speculative execution quirk,
>>>>> but I have not succeeded yet. I suspect that I would need to know
>>>>> more about the internals of the processor architecture to create the
>>>>> right stalls that would allow speculative execution to move further
>>>>> ahead, and trigger an incoherent execution flow. Ideas on how to
>>>>> trigger this would be welcome.
>>>> I thought the whole problem was per definition multi-threaded.
>>>>
>>>> Single-threaded stuff can't get out of sync with itself; you'll always
>>>> observe your own stores.
>>> And even if you could, you can always execute a local serializing
>>> instruction like CPUID to force things.
>> What I'm trying to reproduce is something that breaks in single-threaded
>> case if I explicitly leave out the CPUID core serializing instruction
>> when doing code modification on upcoming code, in a loop.
>>
>> AFAIU, Intel requires a core serializing instruction to be issued even
>> in single-threaded scenarios between code update and execution, to ensure
>> that speculative execution does not observe incoherent code. Now the
>> question we all have for Intel is: is this requirement too strong, or
>> required by reality ?
>>
> 
> In single-threaded execution, a jump is enough.
> 
> "As processor microarchitectures become more complex and start to
> speculatively execute code ahead of the retire-
> ment point (as in P6 and more recent processor families), the rules
> regarding which code should execute, pre- or
> post-modification, become blurred. To write self-modifying code and
> ensure that it is compliant with current and
> future versions of the IA-32 architectures, use one of the following
> coding options:
> 
> (* OPTION 1 *)
> Store modified code (as data) into code segment;
> Jump to new code or an intermediate location;
> Execute new code;"

Good point, so this is likely why I was having trouble reproducing the
single-threaded self-modifying code incoherent case. I did have a branch
in there.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <1216732828.15017.1510679404571.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                     ` <1216732828.15017.1510679404571.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2017-11-14 17:31                                       ` Linus Torvalds
  0 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2017-11-14 17:31 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Avi Kivity, Peter Zijlstra, Andy Lutomirski, linux-kernel,
	linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann, Will

On Tue, Nov 14, 2017 at 9:10 AM, Mathieu Desnoyers
<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>> (* OPTION 1 *)
>> Store modified code (as data) into code segment;
>> Jump to new code or an intermediate location;
>> Execute new code;"
>
> Good point, so this is likely why I was having trouble reproducing the
> single-threaded self-modifying code incoherent case. I did have a branch
> in there.

Actually, even *without* the branch, Intel has been very good at
having precise I$ coherency. I think uou can literally store to the
next instruction, and Intel CPU's after the Pentium Pro would notice,
take a micro-fault, and handle it correctly (the i486 and Pentium did
not have that level of coherency, but a taken branch would flush the
fetch buffer).

An in-order Atom probabably has the old Pentium behavior, and you
could see it there.

But starting with the P6, and OoO execution, the "taken branch" thing
meant very little, so Intel started instead just doing the
"store-vs-instruction fetch" coherency explicitly, which causes it to
be precise.

Afaik, the only way to show incoherent I$ fairly easily is to use
virtual aliasing, and store to a different virtual address, because
the fetch buffer coherency is done by virtual address.

But even then, it's only the fetch buffer (and it's been called
different things over the years, now it's a uop loop cache), not the
L1 caches, so you get a very limited window of instructions.

And that fetch buffer is also where any cross-cpu incoherency would
be, for the exact same reason.

          Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                         ` <20171114160541.GC3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
  2017-11-14 16:08                           ` Peter Zijlstra
@ 2017-11-14 16:10                           ` Andy Lutomirski
       [not found]                             ` <CALCETrVpBocmrd+R5-R-d+QBvp6h8iZkjo7Xjy6V6x1rPfh25w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2017-11-14 16:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Avi Kivity, Linus Torvalds, Andy Lutomirski,
	linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
	Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrea Parri,
	Russell King, ARM Linux

On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> I've tried to create a small single-threaded self-modifying loop in
>> user-space to trigger a trace cache or speculative execution quirk,
>> but I have not succeeded yet. I suspect that I would need to know
>> more about the internals of the processor architecture to create the
>> right stalls that would allow speculative execution to move further
>> ahead, and trigger an incoherent execution flow. Ideas on how to
>> trigger this would be welcome.
>
> I thought the whole problem was per definition multi-threaded.
>
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.
>
> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
>
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
>
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
>
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
>

On x86 single-thread, you can still get in trouble, I think.  Do a
store, get migrated, execute the stored code.  There's no actual
guarantee that the new CPU does a CR3 load due to laziness.

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <CALCETrVpBocmrd+R5-R-d+QBvp6h8iZkjo7Xjy6V6x1rPfh25w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                             ` <CALCETrVpBocmrd+R5-R-d+QBvp6h8iZkjo7Xjy6V6x1rPfh25w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-14 16:13                               ` Thomas Gleixner
  2017-11-14 16:16                                 ` Andy Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2017-11-14 16:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
	linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
	Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

On Tue, 14 Nov 2017, Andy Lutomirski wrote:
> On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> > On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> >> I've tried to create a small single-threaded self-modifying loop in
> >> user-space to trigger a trace cache or speculative execution quirk,
> >> but I have not succeeded yet. I suspect that I would need to know
> >> more about the internals of the processor architecture to create the
> >> right stalls that would allow speculative execution to move further
> >> ahead, and trigger an incoherent execution flow. Ideas on how to
> >> trigger this would be welcome.
> >
> > I thought the whole problem was per definition multi-threaded.
> >
> > Single-threaded stuff can't get out of sync with itself; you'll always
> > observe your own stores.
> >
> > And ISTR the JIT scenario being something like the JIT overwriting
> > previously executed but supposedly no longer used code. And in this
> > scenario you'd want to guarantee all CPUs observe the new code before
> > jumping into it.
> >
> > The current approach is using mprotect(), except that on a number of
> > platforms the TLB invalidate from that is not guaranteed to be strong
> > enough to sync for code changes.
> >
> > On x86 the mprotect() should work just fine, since we broadcast IPIs for
> > the TLB invalidate and the IRET from those will get the things synced up
> > again (if nothing else; very likely we'll have done a MOV-CR3 which will
> > of course also have sufficient syncness on it).
> >
> > But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> > and don't guarantee their TLB invalidate sync against execution units
> > are left broken by this scheme.
> >
> 
> On x86 single-thread, you can still get in trouble, I think.  Do a
> store, get migrated, execute the stored code.  There's no actual
> guarantee that the new CPU does a CR3 load due to laziness.

The migration IPI will probably prevent that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
  2017-11-14 16:13                               ` Thomas Gleixner
@ 2017-11-14 16:16                                 ` Andy Lutomirski
       [not found]                                   ` <CALCETrXpR7ai047pHtdQe5J+FpuFO5ekeeEqLUt1wVLopyNt_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2017-11-14 16:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Peter Zijlstra, Mathieu Desnoyers, Avi Kivity,
	Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
	Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux

On Tue, Nov 14, 2017 at 8:13 AM, Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
> On Tue, 14 Nov 2017, Andy Lutomirski wrote:
>> On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
>> > On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> >> I've tried to create a small single-threaded self-modifying loop in
>> >> user-space to trigger a trace cache or speculative execution quirk,
>> >> but I have not succeeded yet. I suspect that I would need to know
>> >> more about the internals of the processor architecture to create the
>> >> right stalls that would allow speculative execution to move further
>> >> ahead, and trigger an incoherent execution flow. Ideas on how to
>> >> trigger this would be welcome.
>> >
>> > I thought the whole problem was per definition multi-threaded.
>> >
>> > Single-threaded stuff can't get out of sync with itself; you'll always
>> > observe your own stores.
>> >
>> > And ISTR the JIT scenario being something like the JIT overwriting
>> > previously executed but supposedly no longer used code. And in this
>> > scenario you'd want to guarantee all CPUs observe the new code before
>> > jumping into it.
>> >
>> > The current approach is using mprotect(), except that on a number of
>> > platforms the TLB invalidate from that is not guaranteed to be strong
>> > enough to sync for code changes.
>> >
>> > On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> > the TLB invalidate and the IRET from those will get the things synced up
>> > again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> > of course also have sufficient syncness on it).
>> >
>> > But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> > and don't guarantee their TLB invalidate sync against execution units
>> > are left broken by this scheme.
>> >
>>
>> On x86 single-thread, you can still get in trouble, I think.  Do a
>> store, get migrated, execute the stored code.  There's no actual
>> guarantee that the new CPU does a CR3 load due to laziness.
>
> The migration IPI will probably prevent that.

What guarantees that there's an IPI?  Do we never do a syscall, get
migrated during syscall processing (due to cond_resched(), for
example), and land on another CPU that just happened to already be
scheduling?

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <CALCETrXpR7ai047pHtdQe5J+FpuFO5ekeeEqLUt1wVLopyNt_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                   ` <CALCETrXpR7ai047pHtdQe5J+FpuFO5ekeeEqLUt1wVLopyNt_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-14 16:31                                     ` Peter Zijlstra
       [not found]                                       ` <20171114163159.GD3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
	linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
	Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

On Tue, Nov 14, 2017 at 08:16:09AM -0800, Andy Lutomirski wrote:
> What guarantees that there's an IPI?  Do we never do a syscall, get
> migrated during syscall processing (due to cond_resched(), for
> example), and land on another CPU that just happened to already be
> scheduling?

Possible, the other CPU could've pulled the task because it went idle.
No IPIs involved in that scenario.

And if it was running a different thread of the same process prior to
that, we'll also not do switch_mm().

So yes, it is possible to construct a migration scenario without core
serializing instructions (of the CPUID/MOV-CR kind, not the LOCK prefix
kind).

Note that that still requires a multi-threaded process.

There is another scenario; where the NOHZ load-balancer moves the task;
such that the NOHZ load balancing CPU is a 3rd CPU. In that case there
is an interrupt (to affect the load-balancing) but it will not land on
the CPU that's going to run the task.

This could happen for a single threaded task; since I suppose the NOHZ
idle CPU that's going to be the victim could have ran our task last and
still lazily have the mm.

Very tricky to make work, not to mention that I suspect actually going
idle will kill a whole bunch of state real quick.

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20171114163159.GD3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                       ` <20171114163159.GD3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
@ 2017-11-14 17:17                                         ` Daniel Bristot de Oliveira
       [not found]                                           ` <6f9f57fa-8057-cdbe-231b-20920b3b3670-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 17:17 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Thomas Gleixner, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
	linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
	Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
	Greg Hackmann

On 11/14/2017 05:31 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 08:16:09AM -0800, Andy Lutomirski wrote:
>> What guarantees that there's an IPI?  Do we never do a syscall, get
>> migrated during syscall processing (due to cond_resched(), for
>> example), and land on another CPU that just happened to already be
>> scheduling?
> 
> Possible, the other CPU could've pulled the task because it went idle.
> No IPIs involved in that scenario.
> 
> And if it was running a different thread of the same process prior to
> that, we'll also not do switch_mm().
> 
> So yes, it is possible to construct a migration scenario without core
> serializing instructions (of the CPUID/MOV-CR kind, not the LOCK prefix
> kind).
> 
> Note that that still requires a multi-threaded process.
> 
> There is another scenario; where the NOHZ load-balancer moves the task;
> such that the NOHZ load balancing CPU is a 3rd CPU. In that case there
> is an interrupt (to affect the load-balancing) but it will not land on
> the CPU that's going to run the task.
> 
> This could happen for a single threaded task; since I suppose the NOHZ
> idle CPU that's going to be the victim could have ran our task last and
> still lazily have the mm.
> 
> Very tricky to make work, not to mention that I suspect actually going
> idle will kill a whole bunch of state real quick.
> 

IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
is fired as well, but that is not a very common case.

-- Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <6f9f57fa-8057-cdbe-231b-20920b3b3670-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                           ` <6f9f57fa-8057-cdbe-231b-20920b3b3670-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-11-14 17:40                                             ` Peter Zijlstra
       [not found]                                               ` <20171114174049.GF3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-11-14 17:40 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
	Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
	Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux

On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:

> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
> is fired as well, but that is not a very common case.

You're thinking about wake from idle? That is almost always without IPI,
even without idle=poll.

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20171114174049.GF3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                               ` <20171114174049.GF3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
@ 2017-11-14 18:01                                                 ` Daniel Bristot de Oliveira
       [not found]                                                   ` <ada6faaf-9e94-439f-57f2-3b9179cb4bea-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 18:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
	Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
	Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux

On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
> 
>> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
>> is fired as well, but that is not a very common case.
> 
> You're thinking about wake from idle? That is almost always without IPI,
> even without idle=poll.
> 

I meant the resched_curr(rq) of an rq on another CPU. If the dest is
idle && idle=poll, the IPI will not be send.

-- Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <ada6faaf-9e94-439f-57f2-3b9179cb4bea-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                                   ` <ada6faaf-9e94-439f-57f2-3b9179cb4bea-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-11-14 18:17                                                     ` Peter Zijlstra
       [not found]                                                       ` <20171114181732.bwahj6woik27trou-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-11-14 18:17 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
	Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
	Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux

On Tue, Nov 14, 2017 at 07:01:55PM +0100, Daniel Bristot de Oliveira wrote:
> On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
> > On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
> > 
> >> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
> >> is fired as well, but that is not a very common case.
> > 
> > You're thinking about wake from idle? That is almost always without IPI,
> > even without idle=poll.
> > 
> 
> I meant the resched_curr(rq) of an rq on another CPU. If the dest is
> idle && idle=poll, the IPI will not be send.

I'm saying the IPI will not be send even without idle=poll. MWAIT based
idle will also have TIF_POLLING_NRFLAG set.

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20171114181732.bwahj6woik27trou-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]

* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
       [not found]                                                       ` <20171114181732.bwahj6woik27trou-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2017-11-14 18:24                                                         ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 18:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
	Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
	Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
	H. Peter Anvin, Andrea Parri, Russell King, ARM Linux

On 11/14/2017 07:17 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 07:01:55PM +0100, Daniel Bristot de Oliveira wrote:
>> On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
>>> On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
>>>
>>>> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
>>>> is fired as well, but that is not a very common case.
>>>
>>> You're thinking about wake from idle? That is almost always without IPI,
>>> even without idle=poll.
>>>
>>
>> I meant the resched_curr(rq) of an rq on another CPU. If the dest is
>> idle && idle=poll, the IPI will not be send.
> 
> I'm saying the IPI will not be send even without idle=poll. MWAIT based
> idle will also have TIF_POLLING_NRFLAG set.
> 

Yeah! you are right! I missed that point... sorry :-)

-- Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-11-14 18:24 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-10 21:12 [RFC PATCH 0/2] x86: Fix missing core serialization on migration Mathieu Desnoyers
     [not found] ` <20171110211249.10742-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2017-11-10 21:12   ` [RFC PATCH 1/2] x86: Introduce sync_core_before_usermode Mathieu Desnoyers
2017-11-10 21:12   ` [RFC PATCH 2/2] Fix: x86: Add missing core serializing instruction on migration Mathieu Desnoyers
2017-11-10 21:36   ` [RFC PATCH 0/2] x86: Fix missing core serialization " Linus Torvalds
     [not found]     ` <CA+55aFzbroWqi+FTdYhRVSwUZ-M0wDVxjXqDbh40JEnXc2LdgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-10 21:57       ` Mathieu Desnoyers
     [not found]         ` <885227610.13045.1510351034488.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2017-11-10 22:12           ` Linus Torvalds
2017-11-13 16:56           ` Mathieu Desnoyers
     [not found]             ` <617343212.13932.1510592207202.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2017-11-13 17:14               ` Linus Torvalds
2017-11-14 14:53               ` Avi Kivity
     [not found]                 ` <4d47fbb8-8f99-19d3-a9cf-66841aeffac3-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
2017-11-14 15:17                   ` Mathieu Desnoyers
     [not found]                     ` <4431530.14831.1510672632887.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2017-11-14 15:42                       ` Avi Kivity
2017-11-14 16:05                       ` Peter Zijlstra
     [not found]                         ` <20171114160541.GC3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
2017-11-14 16:08                           ` Peter Zijlstra
2017-11-14 16:49                             ` Mathieu Desnoyers
2017-11-14 17:03                               ` Avi Kivity
     [not found]                                 ` <98b50de6-4cb1-9c43-4353-9ee7135dc63f-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
2017-11-14 17:10                                   ` Mathieu Desnoyers
     [not found]                                     ` <1216732828.15017.1510679404571.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2017-11-14 17:31                                       ` Linus Torvalds
2017-11-14 16:10                           ` Andy Lutomirski
     [not found]                             ` <CALCETrVpBocmrd+R5-R-d+QBvp6h8iZkjo7Xjy6V6x1rPfh25w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-14 16:13                               ` Thomas Gleixner
2017-11-14 16:16                                 ` Andy Lutomirski
     [not found]                                   ` <CALCETrXpR7ai047pHtdQe5J+FpuFO5ekeeEqLUt1wVLopyNt_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-14 16:31                                     ` Peter Zijlstra
     [not found]                                       ` <20171114163159.GD3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
2017-11-14 17:17                                         ` Daniel Bristot de Oliveira
     [not found]                                           ` <6f9f57fa-8057-cdbe-231b-20920b3b3670-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-14 17:40                                             ` Peter Zijlstra
     [not found]                                               ` <20171114174049.GF3165-IIpfhp3q70x9+YH6RuovlLjjLBE8jN/0@public.gmane.org>
2017-11-14 18:01                                                 ` Daniel Bristot de Oliveira
     [not found]                                                   ` <ada6faaf-9e94-439f-57f2-3b9179cb4bea-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-14 18:17                                                     ` Peter Zijlstra
     [not found]                                                       ` <20171114181732.bwahj6woik27trou-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2017-11-14 18:24                                                         ` Daniel Bristot de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).