All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
To: Paul Turner <commonly-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>,
	Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	linux-api <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC PATCH v2 2/3] restartable sequences: x86 ABI
Date: Fri, 11 Dec 2015 13:30:15 +0000 (UTC)	[thread overview]
Message-ID: <72792257.232485.1449840615047.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20151027235705.16059.63268.stgit-G8L5E6GV2z5XSTzz+wBt03oUN1GumTyQ7j82oEJ37pA@public.gmane.org>

----- On Oct 27, 2015, at 7:57 PM, Paul Turner commonly-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:

> From: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> Recall the general ABI is:
>   The kernel ABI generally consists of:
>     a) A shared TLS word which exports the current cpu and event-count
>     b) A shared TLS word which, when non-zero, stores the first post-commit
>        instruction if a sequence is active.  (The kernel observing rips greater
>        than this takes no action and concludes user-space has completed its
>        critical section, but not yet cleared this state).
>     c) An architecture specific way to publish the state read from (a) which
>        user-space believes to be current.
> 
> This patch defines (c) for x86, both x86_64 and i386.

It seems to also take care of signal handler restarts (should be
documented in the changelog).

> 
> The exact sequence is:
>  *0. Userspace stores current event+cpu counter values
>   1. Userspace loads the rip to move to at failure into cx
>   2. Userspace loads the rip of the instruction following
>      the critical section into a registered TLS address.
>   3. Userspace loads the values read at [0] into a known
>      location.
>   4. Userspace tests to see whether the current event and
>      cpu counter values match those stored at 0.  Manually
>      jumping to the address from [1] in the case of a
>      mismatch.
> 
>      Note that if we are preempted or otherwise interrupted
>      then the kernel can also now perform this comparison
>      and conditionally jump us to [1].
>   4. Our final instruction bfeore [2] is then our commit.

bfeore -> before

>      The critical section is self-terminating.  [2] must
>      also be cleared at this point.

^ see comments for patch 0/3 for this repeated description.

> 
> For x86_64:
>   [3] uses rdx to represent cpu and event counter as a
>       single 64-bit value.
> 
> For i386:
>   [3] uses ax for cpu and dx for the event_counter.

^ again, see comments for patch 0/3.

> 
>  Both:
>   Instruction after commit: rseq_state->post_commit_instr
>   Current event and cpu state: rseq_state->event_and_cpu
> 
> An example user-space x86_64 implementation:
>    __asm__ __volatile__ goto (
>                    "movq $%l[failed], %%rcx\n"
>                    "movq $1f, %[commit_instr]\n"
>                    "cmpq %[start_value], %[current_value]\n"
>                    "jnz %l[failed]\n"
>                    "movq %[to_write], (%[target])\n"
>                    "1: movq $0, %[commit_instr]\n"
>      : /* no outputs */
>      : [start_value]"d"(start_value.storage),
>        [current_value]"m"(__rseq_state),
>        [to_write]"r"(to_write),
>        [target]"r"(p),
>        [commit_instr]"m"(__rseq_state.post_commit_instr)
>      : "rcx", "memory"
>      : failed
> 
> Signed-off-by: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> arch/x86/entry/common.c                      |    3 +
> arch/x86/include/asm/restartable_sequences.h |   18 +++
> arch/x86/kernel/Makefile                     |    2
> arch/x86/kernel/restartable_sequences.c      |  136 ++++++++++++++++++++++++++
> arch/x86/kernel/signal.c                     |    7 +
> kernel/restartable_sequences.c               |   10 +-
> 6 files changed, 173 insertions(+), 3 deletions(-)
> create mode 100644 arch/x86/include/asm/restartable_sequences.h
> create mode 100644 arch/x86/kernel/restartable_sequences.c
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index a89fdbc..e382487 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -23,6 +23,7 @@
> #include <linux/uprobes.h>
> 
> #include <asm/desc.h>
> +#include <asm/restartable_sequences.h>
> #include <asm/traps.h>
> #include <asm/vdso.h>
> #include <asm/uaccess.h>
> @@ -249,6 +250,8 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32
> cached_flags)
> 		if (cached_flags & _TIF_NOTIFY_RESUME) {
> 			clear_thread_flag(TIF_NOTIFY_RESUME);
> 			tracehook_notify_resume(regs);
> +			if (rseq_active(current))
> +				arch_rseq_update_event_counter(regs);
> 		}
> 
> 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> diff --git a/arch/x86/include/asm/restartable_sequences.h
> b/arch/x86/include/asm/restartable_sequences.h
> new file mode 100644
> index 0000000..75864a7
> --- /dev/null
> +++ b/arch/x86/include/asm/restartable_sequences.h
> @@ -0,0 +1,18 @@
> +#ifndef _ASM_X86_RESTARTABLE_SEQUENCES_H
> +#define _ASM_X86_RESTARTABLE_SEQUENCES_H
> +
> +#include <asm/processor.h>
> +#include <asm/ptrace.h>
> +#include <linux/sched.h>
> +
> +#ifdef CONFIG_RESTARTABLE_SEQUENCES
> +
> +void arch_rseq_update_event_counter(struct pt_regs *regs);
> +
> +#else /* !CONFIG_RESTARTABLE_SEQUENCES */
> +
> +static inline void arch_rseq_update_event_counter(struct pt_regs *regs) {}
> +
> +#endif

for ifdef consistency, I would recommend Paul McKenney's approach:

#ifdef CONFIG_RESTARTABLE_SEQUENCES

#else /* #ifdef CONFIG_RESTARTABLE_SEQUENCES */

#endif /* #else #ifdef CONFIG_RESTARTABLE_SEQUENCES */


> +
> +#endif /* _ASM_X86_RESTARTABLE_SEQUENCES_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index b1b78ff..ee98fb6 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -110,6 +110,8 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
> obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
> obj-$(CONFIG_TRACING)			+= tracepoint.o
> 
> +obj-$(CONFIG_RESTARTABLE_SEQUENCES)	+= restartable_sequences.o
> +
> ###
> # 64 bit specific files
> ifeq ($(CONFIG_X86_64),y)
> diff --git a/arch/x86/kernel/restartable_sequences.c
> b/arch/x86/kernel/restartable_sequences.c
> new file mode 100644
> index 0000000..9f43efd
> --- /dev/null
> +++ b/arch/x86/kernel/restartable_sequences.c
> @@ -0,0 +1,136 @@
> +/*
> + * Restartable Sequences: x86 ABI.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + * Copyright (C) 2015, Google, Inc.,
> + * Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> and Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> + *

Not sure why this empty line here ?

> + */
> +
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +#include <asm/ptrace.h>
> +#include <asm/processor-flags.h>
> +#include <asm/restartable_sequences.h>
> +
> +static inline u64 rseq_encode_cpu_and_event_count(int cpu, int event_count)
> +{
> +	return (u64)(event_count) << 32 | cpu;
> +}

Still wondering why those need to be combined.

> +
> +static inline int rseq_regs_cpu(struct pt_regs *regs, int is_i386)
> +{
> +#ifdef CONFIG_64BIT
> +	return is_i386 ? regs->ax : regs->dx & 0xFFFF;

Should it rather be ?

return is_i386 ? regs->ax : regs->dx & 0xFFFFFFFF;

> +#else
> +	return regs->ax;
> +#endif
> +}
> +
> +static inline int rseq_regs_event_count(struct pt_regs *regs, int is_i386)
> +{
> +#ifdef CONFIG_64BIT
> +	return is_i386 ? regs->dx : regs->dx >> 32;
> +#else
> +	return regs->dx;
> +#endif
> +}
> +
> +void arch_rseq_update_event_counter(struct pt_regs *regs)
> +{
> +	struct restartable_sequence_state *rseq_state = &current->rseq_state;
> +	int cpu = task_cpu(current);

Could we change task_cpu(current) for smp_processor_id() ?

> +	int is_i386 = test_tsk_thread_flag(current, TIF_IA32);

How should we consider x32 (is_x32_task()) ?

> +	int addr_size = is_i386 ? 4 : 8;

See other patch comment about using is_compat_task() and sizeof().

> +	long post_commit_instr = 0;
> +	u64 state_value;
> +
> +	/*
> +	 * Note: post_commit_instr must be zero-initialized above for the case
> +	 * of a 32-bit thread on a 64-bit system.
> +	 */
> +	if (copy_from_user(&post_commit_instr,
> +			   rseq_state->post_commit_instr_addr, addr_size)) {

Hrm, that's a little endian hack. Could this be cleaned up so
it won't break horribly when ported to big endian architectures ?

> +		goto fail;
> +	}
> +
> +	/* Handle potentially being within a critical section. */
> +	if (regs->ip < post_commit_instr) {

It appears that this mechanism is not responsible for handling
rseq that contain a call to sub-functions, given that those sub-functions
could be above post_commit_instr, and given that it would be incorrect
to move the instruction pointer outside of the scope of the inline assembly.
Are function calls supported within the rseq c.s. ?

> +		/*
> +		 * The ABI is relatively compact, with some differences for 32
> +		 * and 64-bit threads.
> +		 *
> +		 * Progress is ordered as follows:
> +		 *  *0. USerspace stores current event+cpu counter values

USerspace -> Userspace

> +		 *   1. Userspace loads the rip to move to at failure into cx
> +		 *   2. Userspace loads the rip of the instruction following
> +		 *      the critical section into a registered TLS address.
> +		 *   3. Userspace loads the values read at [0] into a known
> +		 *      location.
> +		 *   4. Userspace tests to see whether the current event and
> +		 *      cpu counter values match those stored at 0.  Manually
> +		 *      jumping to the address from [1] in the case of a
> +		 *      mismatch.
> +		 *
> +		 *      Note that if we are preempted or otherwise interrupted
> +		 *      then the kernel can also now perform this comparison
> +		 *      and conditionally jump us to [1].
> +		 *   4. Our final instruction bfeore [2] is then our commit.

bfeore -> before

> +		 *      The critical section is self-terminating.  [2] must
> +		 *      also be cleared at this point.
> +		 *
> +		 * For x86_64:
> +		 *   [3] uses rdx to represent cpu and event counter as a
> +		 *       single 64-bit value.
> +		 *
> +		 * For i386:
> +		 *   [3] uses ax for cpu and dx for the event_counter.
> +		 *
> +		 *  Both:
> +		 *   Instruction after commit: rseq_state->post_commit_instr
> +		 *   Current event and cpu state: rseq_state->event_and_cpu

^ see comments on patch 0/3 changelog.

> +		 *

Empty line to remove.

Thanks,

Mathieu

> +		 */
> +		if (rseq_regs_cpu(regs, is_i386) != cpu ||
> +		    rseq_regs_event_count(regs, is_i386) !=
> +				rseq_state->event_counter) {
> +			if (clear_user(rseq_state->post_commit_instr_addr,
> +						addr_size))
> +				goto fail;
> +
> +			/*
> +			 * We set this after potentially failing in clear_user
> +			 * so that the signal arrives at the faulting rip.
> +			 */
> +			regs->ip = regs->cx;
> +		}
> +	}
> +
> +	/* Update state. Compute the next expected state value. */
> +	state_value = rseq_encode_cpu_and_event_count(cpu,
> +			++rseq_state->event_counter);
> +
> +	if (put_user(state_value, rseq_state->event_and_cpu))
> +		goto fail;
> +
> +	return;
> +fail:
> +	/*
> +	 * User space has made some (invalid) protection change that does not
> +	 * allow us to safely continue execution.  SEGV is the result.
> +	 */
> +	force_sig(SIGSEGV, current);
> +}
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index b7ffb7c..58f8813 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -30,6 +30,7 @@
> #include <asm/fpu/signal.h>
> #include <asm/vdso.h>
> #include <asm/mce.h>
> +#include <asm/restartable_sequences.h>
> #include <asm/sighandling.h>
> #include <asm/vm86.h>
> 
> @@ -615,6 +616,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
> 	sigset_t *set = sigmask_to_save();
> 	compat_sigset_t *cset = (compat_sigset_t *) set;
> 
> +#ifdef CONFIG_RESTARTABLE_SEQUENCES
> +	/* Increment the event counter for the, present, pre-signal frame. */

There appears to be a lot of, commas, here. ;-)

> +	if (rseq_active(current))
> +		arch_rseq_update_event_counter(regs);
> +#endif
> +
> 	/* Set up the stack frame */
> 	if (is_ia32_frame()) {
> 		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
> diff --git a/kernel/restartable_sequences.c b/kernel/restartable_sequences.c
> index c99a574..5449561 100644
> --- a/kernel/restartable_sequences.c
> +++ b/kernel/restartable_sequences.c
> @@ -24,17 +24,21 @@
> 
> #ifdef CONFIG_RESTARTABLE_SEQUENCES
> 
> +#include <asm/restartable_sequences.h>
> #include <linux/uaccess.h>
> #include <linux/preempt.h>
> #include <linux/syscalls.h>
> 
> static void rseq_sched_in_nop(struct preempt_notifier *pn, int cpu) {}
> -static void rseq_sched_out_nop(struct preempt_notifier *pn,
> -			       struct task_struct *next) {}
> +static void rseq_sched_out(struct preempt_notifier *pn,
> +			   struct task_struct *next)
> +{
> +	set_thread_flag(TIF_NOTIFY_RESUME);
> +}
> 
> static __read_mostly struct preempt_ops rseq_preempt_ops = {
> 	.sched_in = rseq_sched_in_nop,
> -	.sched_out = rseq_sched_out_nop,
> +	.sched_out = rseq_sched_out,
> };
> 
>  int rseq_configure_current(__user u64 *event_and_cpu,

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Paul Turner <commonly@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Hunter <ahh@google.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Watson <davejwatson@fb.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-api <linux-api@vger.kernel.org>,
	linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	Ingo Molnar <mingo@redhat.com>, Chris Lameter <cl@linux.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC PATCH v2 2/3] restartable sequences: x86 ABI
Date: Fri, 11 Dec 2015 13:30:15 +0000 (UTC)	[thread overview]
Message-ID: <72792257.232485.1449840615047.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20151027235705.16059.63268.stgit@pjt-glaptop.roam.corp.google.com>

----- On Oct 27, 2015, at 7:57 PM, Paul Turner commonly@gmail.com wrote:

> From: Paul Turner <pjt@google.com>
> 
> Recall the general ABI is:
>   The kernel ABI generally consists of:
>     a) A shared TLS word which exports the current cpu and event-count
>     b) A shared TLS word which, when non-zero, stores the first post-commit
>        instruction if a sequence is active.  (The kernel observing rips greater
>        than this takes no action and concludes user-space has completed its
>        critical section, but not yet cleared this state).
>     c) An architecture specific way to publish the state read from (a) which
>        user-space believes to be current.
> 
> This patch defines (c) for x86, both x86_64 and i386.

It seems to also take care of signal handler restarts (should be
documented in the changelog).

> 
> The exact sequence is:
>  *0. Userspace stores current event+cpu counter values
>   1. Userspace loads the rip to move to at failure into cx
>   2. Userspace loads the rip of the instruction following
>      the critical section into a registered TLS address.
>   3. Userspace loads the values read at [0] into a known
>      location.
>   4. Userspace tests to see whether the current event and
>      cpu counter values match those stored at 0.  Manually
>      jumping to the address from [1] in the case of a
>      mismatch.
> 
>      Note that if we are preempted or otherwise interrupted
>      then the kernel can also now perform this comparison
>      and conditionally jump us to [1].
>   4. Our final instruction bfeore [2] is then our commit.

bfeore -> before

>      The critical section is self-terminating.  [2] must
>      also be cleared at this point.

^ see comments for patch 0/3 for this repeated description.

> 
> For x86_64:
>   [3] uses rdx to represent cpu and event counter as a
>       single 64-bit value.
> 
> For i386:
>   [3] uses ax for cpu and dx for the event_counter.

^ again, see comments for patch 0/3.

> 
>  Both:
>   Instruction after commit: rseq_state->post_commit_instr
>   Current event and cpu state: rseq_state->event_and_cpu
> 
> An example user-space x86_64 implementation:
>    __asm__ __volatile__ goto (
>                    "movq $%l[failed], %%rcx\n"
>                    "movq $1f, %[commit_instr]\n"
>                    "cmpq %[start_value], %[current_value]\n"
>                    "jnz %l[failed]\n"
>                    "movq %[to_write], (%[target])\n"
>                    "1: movq $0, %[commit_instr]\n"
>      : /* no outputs */
>      : [start_value]"d"(start_value.storage),
>        [current_value]"m"(__rseq_state),
>        [to_write]"r"(to_write),
>        [target]"r"(p),
>        [commit_instr]"m"(__rseq_state.post_commit_instr)
>      : "rcx", "memory"
>      : failed
> 
> Signed-off-by: Paul Turner <pjt@google.com>
> ---
> arch/x86/entry/common.c                      |    3 +
> arch/x86/include/asm/restartable_sequences.h |   18 +++
> arch/x86/kernel/Makefile                     |    2
> arch/x86/kernel/restartable_sequences.c      |  136 ++++++++++++++++++++++++++
> arch/x86/kernel/signal.c                     |    7 +
> kernel/restartable_sequences.c               |   10 +-
> 6 files changed, 173 insertions(+), 3 deletions(-)
> create mode 100644 arch/x86/include/asm/restartable_sequences.h
> create mode 100644 arch/x86/kernel/restartable_sequences.c
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index a89fdbc..e382487 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -23,6 +23,7 @@
> #include <linux/uprobes.h>
> 
> #include <asm/desc.h>
> +#include <asm/restartable_sequences.h>
> #include <asm/traps.h>
> #include <asm/vdso.h>
> #include <asm/uaccess.h>
> @@ -249,6 +250,8 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32
> cached_flags)
> 		if (cached_flags & _TIF_NOTIFY_RESUME) {
> 			clear_thread_flag(TIF_NOTIFY_RESUME);
> 			tracehook_notify_resume(regs);
> +			if (rseq_active(current))
> +				arch_rseq_update_event_counter(regs);
> 		}
> 
> 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> diff --git a/arch/x86/include/asm/restartable_sequences.h
> b/arch/x86/include/asm/restartable_sequences.h
> new file mode 100644
> index 0000000..75864a7
> --- /dev/null
> +++ b/arch/x86/include/asm/restartable_sequences.h
> @@ -0,0 +1,18 @@
> +#ifndef _ASM_X86_RESTARTABLE_SEQUENCES_H
> +#define _ASM_X86_RESTARTABLE_SEQUENCES_H
> +
> +#include <asm/processor.h>
> +#include <asm/ptrace.h>
> +#include <linux/sched.h>
> +
> +#ifdef CONFIG_RESTARTABLE_SEQUENCES
> +
> +void arch_rseq_update_event_counter(struct pt_regs *regs);
> +
> +#else /* !CONFIG_RESTARTABLE_SEQUENCES */
> +
> +static inline void arch_rseq_update_event_counter(struct pt_regs *regs) {}
> +
> +#endif

for ifdef consistency, I would recommend Paul McKenney's approach:

#ifdef CONFIG_RESTARTABLE_SEQUENCES

#else /* #ifdef CONFIG_RESTARTABLE_SEQUENCES */

#endif /* #else #ifdef CONFIG_RESTARTABLE_SEQUENCES */


> +
> +#endif /* _ASM_X86_RESTARTABLE_SEQUENCES_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index b1b78ff..ee98fb6 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -110,6 +110,8 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
> obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
> obj-$(CONFIG_TRACING)			+= tracepoint.o
> 
> +obj-$(CONFIG_RESTARTABLE_SEQUENCES)	+= restartable_sequences.o
> +
> ###
> # 64 bit specific files
> ifeq ($(CONFIG_X86_64),y)
> diff --git a/arch/x86/kernel/restartable_sequences.c
> b/arch/x86/kernel/restartable_sequences.c
> new file mode 100644
> index 0000000..9f43efd
> --- /dev/null
> +++ b/arch/x86/kernel/restartable_sequences.c
> @@ -0,0 +1,136 @@
> +/*
> + * Restartable Sequences: x86 ABI.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + * Copyright (C) 2015, Google, Inc.,
> + * Paul Turner <pjt@google.com> and Andrew Hunter <ahh@google.com>
> + *

Not sure why this empty line here ?

> + */
> +
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +#include <asm/ptrace.h>
> +#include <asm/processor-flags.h>
> +#include <asm/restartable_sequences.h>
> +
> +static inline u64 rseq_encode_cpu_and_event_count(int cpu, int event_count)
> +{
> +	return (u64)(event_count) << 32 | cpu;
> +}

Still wondering why those need to be combined.

> +
> +static inline int rseq_regs_cpu(struct pt_regs *regs, int is_i386)
> +{
> +#ifdef CONFIG_64BIT
> +	return is_i386 ? regs->ax : regs->dx & 0xFFFF;

Should it rather be ?

return is_i386 ? regs->ax : regs->dx & 0xFFFFFFFF;

> +#else
> +	return regs->ax;
> +#endif
> +}
> +
> +static inline int rseq_regs_event_count(struct pt_regs *regs, int is_i386)
> +{
> +#ifdef CONFIG_64BIT
> +	return is_i386 ? regs->dx : regs->dx >> 32;
> +#else
> +	return regs->dx;
> +#endif
> +}
> +
> +void arch_rseq_update_event_counter(struct pt_regs *regs)
> +{
> +	struct restartable_sequence_state *rseq_state = &current->rseq_state;
> +	int cpu = task_cpu(current);

Could we change task_cpu(current) for smp_processor_id() ?

> +	int is_i386 = test_tsk_thread_flag(current, TIF_IA32);

How should we consider x32 (is_x32_task()) ?

> +	int addr_size = is_i386 ? 4 : 8;

See other patch comment about using is_compat_task() and sizeof().

> +	long post_commit_instr = 0;
> +	u64 state_value;
> +
> +	/*
> +	 * Note: post_commit_instr must be zero-initialized above for the case
> +	 * of a 32-bit thread on a 64-bit system.
> +	 */
> +	if (copy_from_user(&post_commit_instr,
> +			   rseq_state->post_commit_instr_addr, addr_size)) {

Hrm, that's a little endian hack. Could this be cleaned up so
it won't break horribly when ported to big endian architectures ?

> +		goto fail;
> +	}
> +
> +	/* Handle potentially being within a critical section. */
> +	if (regs->ip < post_commit_instr) {

It appears that this mechanism is not responsible for handling
rseq that contain a call to sub-functions, given that those sub-functions
could be above post_commit_instr, and given that it would be incorrect
to move the instruction pointer outside of the scope of the inline assembly.
Are function calls supported within the rseq c.s. ?

> +		/*
> +		 * The ABI is relatively compact, with some differences for 32
> +		 * and 64-bit threads.
> +		 *
> +		 * Progress is ordered as follows:
> +		 *  *0. USerspace stores current event+cpu counter values

USerspace -> Userspace

> +		 *   1. Userspace loads the rip to move to at failure into cx
> +		 *   2. Userspace loads the rip of the instruction following
> +		 *      the critical section into a registered TLS address.
> +		 *   3. Userspace loads the values read at [0] into a known
> +		 *      location.
> +		 *   4. Userspace tests to see whether the current event and
> +		 *      cpu counter values match those stored at 0.  Manually
> +		 *      jumping to the address from [1] in the case of a
> +		 *      mismatch.
> +		 *
> +		 *      Note that if we are preempted or otherwise interrupted
> +		 *      then the kernel can also now perform this comparison
> +		 *      and conditionally jump us to [1].
> +		 *   4. Our final instruction bfeore [2] is then our commit.

bfeore -> before

> +		 *      The critical section is self-terminating.  [2] must
> +		 *      also be cleared at this point.
> +		 *
> +		 * For x86_64:
> +		 *   [3] uses rdx to represent cpu and event counter as a
> +		 *       single 64-bit value.
> +		 *
> +		 * For i386:
> +		 *   [3] uses ax for cpu and dx for the event_counter.
> +		 *
> +		 *  Both:
> +		 *   Instruction after commit: rseq_state->post_commit_instr
> +		 *   Current event and cpu state: rseq_state->event_and_cpu

^ see comments on patch 0/3 changelog.

> +		 *

Empty line to remove.

Thanks,

Mathieu

> +		 */
> +		if (rseq_regs_cpu(regs, is_i386) != cpu ||
> +		    rseq_regs_event_count(regs, is_i386) !=
> +				rseq_state->event_counter) {
> +			if (clear_user(rseq_state->post_commit_instr_addr,
> +						addr_size))
> +				goto fail;
> +
> +			/*
> +			 * We set this after potentially failing in clear_user
> +			 * so that the signal arrives at the faulting rip.
> +			 */
> +			regs->ip = regs->cx;
> +		}
> +	}
> +
> +	/* Update state. Compute the next expected state value. */
> +	state_value = rseq_encode_cpu_and_event_count(cpu,
> +			++rseq_state->event_counter);
> +
> +	if (put_user(state_value, rseq_state->event_and_cpu))
> +		goto fail;
> +
> +	return;
> +fail:
> +	/*
> +	 * User space has made some (invalid) protection change that does not
> +	 * allow us to safely continue execution.  SEGV is the result.
> +	 */
> +	force_sig(SIGSEGV, current);
> +}
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index b7ffb7c..58f8813 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -30,6 +30,7 @@
> #include <asm/fpu/signal.h>
> #include <asm/vdso.h>
> #include <asm/mce.h>
> +#include <asm/restartable_sequences.h>
> #include <asm/sighandling.h>
> #include <asm/vm86.h>
> 
> @@ -615,6 +616,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
> 	sigset_t *set = sigmask_to_save();
> 	compat_sigset_t *cset = (compat_sigset_t *) set;
> 
> +#ifdef CONFIG_RESTARTABLE_SEQUENCES
> +	/* Increment the event counter for the, present, pre-signal frame. */

There appears to be a lot of, commas, here. ;-)

> +	if (rseq_active(current))
> +		arch_rseq_update_event_counter(regs);
> +#endif
> +
> 	/* Set up the stack frame */
> 	if (is_ia32_frame()) {
> 		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
> diff --git a/kernel/restartable_sequences.c b/kernel/restartable_sequences.c
> index c99a574..5449561 100644
> --- a/kernel/restartable_sequences.c
> +++ b/kernel/restartable_sequences.c
> @@ -24,17 +24,21 @@
> 
> #ifdef CONFIG_RESTARTABLE_SEQUENCES
> 
> +#include <asm/restartable_sequences.h>
> #include <linux/uaccess.h>
> #include <linux/preempt.h>
> #include <linux/syscalls.h>
> 
> static void rseq_sched_in_nop(struct preempt_notifier *pn, int cpu) {}
> -static void rseq_sched_out_nop(struct preempt_notifier *pn,
> -			       struct task_struct *next) {}
> +static void rseq_sched_out(struct preempt_notifier *pn,
> +			   struct task_struct *next)
> +{
> +	set_thread_flag(TIF_NOTIFY_RESUME);
> +}
> 
> static __read_mostly struct preempt_ops rseq_preempt_ops = {
> 	.sched_in = rseq_sched_in_nop,
> -	.sched_out = rseq_sched_out_nop,
> +	.sched_out = rseq_sched_out,
> };
> 
>  int rseq_configure_current(__user u64 *event_and_cpu,

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2015-12-11 13:30 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-27 23:56 [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections Paul Turner
2015-10-27 23:56 ` Paul Turner
2015-10-27 23:56 ` [RFC PATCH v2 1/3] restartable sequences: user-space per-cpu " Paul Turner
     [not found]   ` <20151027235653.16059.8933.stgit-G8L5E6GV2z5XSTzz+wBt03oUN1GumTyQ7j82oEJ37pA@public.gmane.org>
2015-11-19 16:38     ` Johannes Berg
2015-11-19 16:38       ` Johannes Berg
2015-12-11 12:56     ` Mathieu Desnoyers
2015-12-11 12:56       ` Mathieu Desnoyers
2015-10-27 23:57 ` [RFC PATCH v2 2/3] restartable sequences: x86 ABI Paul Turner
     [not found]   ` <20151027235705.16059.63268.stgit-G8L5E6GV2z5XSTzz+wBt03oUN1GumTyQ7j82oEJ37pA@public.gmane.org>
2015-10-28  5:03     ` Peter Zijlstra
2015-10-28  5:03       ` Peter Zijlstra
     [not found]       ` <20151028050314.GC11242-IIpfhp3q70xmmu7s1q4rt2t3HXsI98Cx0E9HWUfgJXw@public.gmane.org>
2015-10-28  5:19         ` Paul Turner
2015-10-28  5:19           ` Paul Turner
2015-12-11 13:30     ` Mathieu Desnoyers [this message]
2015-12-11 13:30       ` Mathieu Desnoyers
2015-10-27 23:57 ` [RFC PATCH v2 3/3] restartable sequences: basic self-tests Paul Turner
     [not found]   ` <20151027235716.16059.47610.stgit-G8L5E6GV2z5XSTzz+wBt03oUN1GumTyQ7j82oEJ37pA@public.gmane.org>
2016-04-05 20:33     ` Mathieu Desnoyers
2016-04-05 20:33       ` Mathieu Desnoyers
     [not found]       ` <1276514010.46061.1459888406999.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2016-04-06  7:43         ` Peter Zijlstra
2016-04-06  7:43           ` Peter Zijlstra
     [not found]           ` <20160406074309.GE3430-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-06 13:39             ` Mathieu Desnoyers
2016-04-06 13:39               ` Mathieu Desnoyers
     [not found]               ` <528054829.46502.1459949962537.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2016-04-06 19:25                 ` Peter Zijlstra
2016-04-06 19:25                   ` Peter Zijlstra
     [not found] ` <20151027235635.16059.11630.stgit-G8L5E6GV2z5XSTzz+wBt03oUN1GumTyQ7j82oEJ37pA@public.gmane.org>
2015-10-28 14:44   ` [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections Dave Watson
2015-10-28 14:44     ` Dave Watson
2015-12-11 12:05   ` Mathieu Desnoyers
2015-12-11 12:05     ` Mathieu Desnoyers
     [not found]     ` <1070636085.232143.1449835536723.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2015-12-11 13:39       ` Mathieu Desnoyers
2015-12-11 13:39         ` Mathieu Desnoyers
2016-04-06 15:56 ` Andy Lutomirski
2016-04-07 12:02   ` Peter Zijlstra
     [not found]     ` <20160407120254.GY3448-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-07 14:35       ` Andy Lutomirski
2016-04-07 14:35         ` Andy Lutomirski
     [not found]         ` <CALCETrV0vcYcnBrs0axykJD=_BM28wKWVMG6bMzK8zh8R3m5fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-07 15:24           ` Peter Zijlstra
2016-04-07 15:24             ` Peter Zijlstra
     [not found]             ` <20160407152432.GZ3448-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-07 15:39               ` Peter Zijlstra
2016-04-07 15:39                 ` Peter Zijlstra
2016-04-07 15:44               ` Andy Lutomirski
2016-04-07 15:44                 ` Andy Lutomirski
     [not found]                 ` <CALCETrU5ZL6Jajc=9up-j86vY_Xtt-gTFjdQE0sB0d=d-CJZ6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-07 15:53                   ` Peter Zijlstra
2016-04-07 15:53                     ` Peter Zijlstra
     [not found]                     ` <20160407155312.GA3448-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-07 16:43                       ` Andy Lutomirski
2016-04-07 16:43                         ` Andy Lutomirski
     [not found]                         ` <CALCETrVGo1Di3qamxx1NAFUSN_o=-HnYRDpeVp7zrQEBwe5u-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-07 20:11                           ` Peter Zijlstra
2016-04-07 20:11                             ` Peter Zijlstra
     [not found]                             ` <20160407201156.GC3448-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-07 22:05                               ` Andy Lutomirski
2016-04-07 22:05                                 ` Andy Lutomirski
     [not found]                                 ` <CALCETrXVReuuGGKW6EOV7tFFaK9RbwWxYvKdpUdvU=MpDaOtsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-08  1:11                                   ` Mathieu Desnoyers
2016-04-08  1:11                                     ` Mathieu Desnoyers
2016-04-08  1:21                                     ` Andy Lutomirski
2016-04-08  2:05                                       ` Mathieu Desnoyers
2016-04-08  2:05                                         ` Mathieu Desnoyers
2016-04-08 17:46                                         ` Mathieu Desnoyers
     [not found]                                           ` <65466698.51122.1460137589499.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2016-04-08 21:16                                             ` Andy Lutomirski
2016-04-08 21:16                                               ` Andy Lutomirski
2016-04-08 21:25                                             ` Linus Torvalds
2016-04-08 21:25                                               ` Linus Torvalds
     [not found]                                               ` <CA+55aFwqJmTy+Nz0k9N_2zsms51meTFMdvYYW5VHdiOq8Jjr7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-10 14:07                                                 ` Mathieu Desnoyers
2016-04-10 14:07                                                   ` Mathieu Desnoyers
2016-04-08 11:02                                   ` Peter Zijlstra
2016-04-08 11:02                                     ` Peter Zijlstra
     [not found]                                     ` <20160408110232.GP3448-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-08 15:57                                       ` Andy Lutomirski
2016-04-08 15:57                                         ` Andy Lutomirski
2016-04-08  6:41                           ` Peter Zijlstra
2016-04-08  6:41                             ` Peter Zijlstra
     [not found]                             ` <20160408064136.GJ3448-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-04-08 15:58                               ` Andy Lutomirski
2016-04-08 15:58                                 ` Andy Lutomirski
2016-04-11 21:55                           ` Mathieu Desnoyers
2016-04-11 21:55                             ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72792257.232485.1449840615047.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers-vg+e7yoek/dwk0htik3j/w@public.gmane.org \
    --cc=ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org \
    --cc=cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org \
    --cc=commonly-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=davejwatson-b10kYP2dOMg@public.gmane.org \
    --cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
    --cc=josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.