From: Christian Brauner <brauner@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"Paul E . McKenney" <paulmck@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
"H . Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
linux-api@vger.kernel.org,
Christian Brauner <christian.brauner@ubuntu.com>,
Florian Weimer <fw@deneb.enyo.de>,
David.Laight@ACULAB.COM, carlos@redhat.com
Subject: Re: [RFC PATCH v2 1/2] rseq: x86: implement abort-at-ip extension
Date: Tue, 11 Jan 2022 12:05:56 +0100 [thread overview]
Message-ID: <20220111110556.inteixgtl5vpmka7@wittgenstein> (raw)
In-Reply-To: <20220110171611.8351-1-mathieu.desnoyers@efficios.com>
On Mon, Jan 10, 2022 at 12:16:10PM -0500, Mathieu Desnoyers wrote:
> Allow rseq critical section abort handlers to optionally figure out at
> which instruction pointer the rseq critical section was aborted.
>
> This allows implementing rseq critical sections with loops containing
> the commit instruction, for which having the commit as last instruction
> of the sequence is not possible. This is useful to implement adaptative
> mutexes aware of preemption in user-space. (see [1])
>
> This also allows implementing rseq critical sections with multiple
> commit steps, and use the abort-at-ip information to figure out what
> needs to be undone in the abort handler.
>
> Introduce the RSEQ_FLAG_QUERY_ABORT_AT_IP rseq system call flag. This
> lets userspace query whether the kernel and architecture supports the
> abort-at-ip rseq extension.
>
> Only provide this information for rseq critical sections for which the
> rseq_cs descriptor has the RSEQ_CS_FLAG_ABORT_AT_IP flag set. Abort
> handlers for critical sections with this flag set need to readjust the
> stack pointer. The abort-at-ip pointer is populated by the kernel on
> the top of stack on abort. For x86-32, the abort handler of an
> abort-at-ip critical section needs to add 4 bytes to the stack pointer.
> For x86-64, the abort hanler needs to add 136 bytes to the stack
> pointer: 8 bytes to skip the abort-at-ip value, and 128 bytes to skip
> the x86-64 redzone for leaf functions.
>
> [1] https://github.com/compudj/rseq-test/blob/adapt-lock-abort-at-ip/test-rseq-adaptative-lock.c#L80
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> Changes since v1:
> - Use top of stack to place abort-at-ip value rather than ecx/rcx
> register,
> - Skip redzone on x86-64.
> ---
> arch/x86/include/asm/ptrace.h | 5 +++++
> arch/x86/kernel/ptrace.c | 12 ++++++++++++
> include/uapi/linux/rseq.h | 4 ++++
> kernel/rseq.c | 28 ++++++++++++++++++++++++++++
> 4 files changed, 49 insertions(+)
>
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index 703663175a5a..c96eb2448110 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -387,5 +387,10 @@ extern int do_set_thread_area(struct task_struct *p, int idx,
> # define do_set_thread_area_64(p, s, t) (0)
> #endif
>
> +#ifdef CONFIG_RSEQ
> +# define RSEQ_ARCH_HAS_ABORT_AT_IP
> +int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip);
> +#endif
> +
> #endif /* !__ASSEMBLY__ */
> #endif /* _ASM_X86_PTRACE_H */
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 6d2244c94799..561ed98d12ba 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -1368,3 +1368,15 @@ void user_single_step_report(struct pt_regs *regs)
> {
> send_sigtrap(regs, 0, TRAP_BRKPT);
> }
> +
> +int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip)
> +{
> + if (user_64bit_mode(regs)) {
> + /* Need to skip redzone for leaf functions. */
> + regs->sp -= sizeof(u64) + 128;
> + return put_user(ip, (u64 __user *)regs->sp);
> + } else {
> + regs->sp -= sizeof(u32);
> + return put_user(ip, (u32 __user *)regs->sp);
> + }
I think it would be really helpful if you added the full explanation for
sizeof(u64) + 128 or -sizeof(u32) into this codepath or provide
constants. For folks not familiar with stuff like this it'll look like
magic numbers. :)
> +}
> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
> index 9a402fdb60e9..3130232e6d0c 100644
> --- a/include/uapi/linux/rseq.h
> +++ b/include/uapi/linux/rseq.h
> @@ -20,12 +20,14 @@ enum rseq_cpu_id_state {
>
> enum rseq_flags {
> RSEQ_FLAG_UNREGISTER = (1 << 0),
> + RSEQ_FLAG_QUERY_ABORT_AT_IP = (1 << 1),
> };
>
> enum rseq_cs_flags_bit {
> RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT = 0,
> RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT = 1,
> RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT = 2,
> + RSEQ_CS_FLAG_ABORT_AT_IP_BIT = 3,
> };
>
> enum rseq_cs_flags {
> @@ -35,6 +37,8 @@ enum rseq_cs_flags {
> (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
> RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE =
> (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
> + RSEQ_CS_FLAG_ABORT_AT_IP =
> + (1U << RSEQ_CS_FLAG_ABORT_AT_IP_BIT),
> };
>
> /*
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 6d45ac3dae7f..fb52f2d11b56 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -21,6 +21,13 @@
> #define RSEQ_CS_PREEMPT_MIGRATE_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE | \
> RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT)
>
> +#ifdef RSEQ_ARCH_HAS_ABORT_AT_IP
> +static bool rseq_has_abort_at_ip(void) { return true; }
> +#else
> +static bool rseq_has_abort_at_ip(void) { return false; }
> +static int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip) { return 0; }
> +#endif
> +
> /*
> *
> * Restartable sequences are a lightweight interface that allows
> @@ -79,6 +86,16 @@
> *
> * [abort_ip]
> * F1. <failure>
> + *
> + * rseq critical sections defined with the RSEQ_CS_FLAG_ABORT_AT_IP flag
> + * have the following behavior on abort: when the stack grows down: the
> + * stack pointer is decremented to skip the redzone, and decremented of
> + * the pointer size. The aborted address (abort-at-ip) is stored at
> + * this stack pointer location. The user-space abort handler needs to
> + * pop the abort-at-ip address from the stack, and add the redzone size
> + * to the stack pointer.
> + *
> + * TODO: describe stack grows up.
Is this intentional or did you forget? :)
next prev parent reply other threads:[~2022-01-11 11:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-10 17:16 [RFC PATCH v2 1/2] rseq: x86: implement abort-at-ip extension Mathieu Desnoyers
2022-01-10 17:16 ` [RFC PATCH v2 2/2] selftests: rseq: test abort-at-ip extension on x86 Mathieu Desnoyers
2022-01-11 11:05 ` Christian Brauner [this message]
2022-01-11 17:43 ` [RFC PATCH v2 1/2] rseq: x86: implement abort-at-ip extension Mathieu Desnoyers
2022-01-12 8:46 ` Christian Brauner
2022-01-12 14:47 ` Mathieu Desnoyers
2022-01-12 14:55 ` Christian Brauner
2022-01-12 14:58 ` David Laight
2022-01-12 15:05 ` Mathieu Desnoyers
2022-01-12 15:15 ` David Laight
2022-01-12 15:24 ` Mathieu Desnoyers
2022-01-12 15:34 ` Peter Zijlstra
2022-01-12 15:53 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220111110556.inteixgtl5vpmka7@wittgenstein \
--to=brauner@kernel.org \
--cc=David.Laight@ACULAB.COM \
--cc=boqun.feng@gmail.com \
--cc=carlos@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=fw@deneb.enyo.de \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox