Re: [patch V3 01/13] entry: Provide generic syscall entry functionality

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Kees Cook <keescook@chromium.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, linux-arch@vger.kernel.org,
	Will Deacon <will@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Mark Rutland <mark.rutland@arm.com>,
	Keno Fischer <keno@juliacomputing.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [patch V3 01/13] entry: Provide generic syscall entry functionality
Date: Thu, 16 Jul 2020 13:52:33 -0700	[thread overview]
Message-ID: <202007161336.B993ED938@keescook> (raw)
In-Reply-To: <20200716185424.011950288@linutronix.de>

On Thu, Jul 16, 2020 at 08:22:09PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> On syscall entry certain work needs to be done:
> 
>    - Establish state (lockdep, context tracking, tracing)
>    - Conditional work (ptrace, seccomp, audit...)
> 
> This code is needlessly duplicated and  different in all
> architectures.
> 
> Provide a generic version based on the x86 implementation which has all the
> RCU and instrumentation bits right.

Ahh! You're reading my mind! I was just thinking about this while
reviewing the proposed syscall redirection series[1], and pondering the
lack of x86 TIF flags, and that nearly everything in the series (and for
seccomp and other things) didn't need to be arch-specific. And now that
series absolutely needs to be rebased and it'll magically work for every
arch that switches to the generic entry code. :)

Notes below...

[1] https://lore.kernel.org/lkml/20200716193141.4068476-2-krisman@collabora.com/

> +/*
> + * Define dummy _TIF work flags if not defined by the architecture or for
> + * disabled functionality.
> + */

When I was thinking about this last week I was pondering having a split
between the arch-agnositc TIF flags and the arch-specific TIF flags, and
that each arch could have a single "there is agnostic work to be done"
TIF in their thread_info, and the agnostic flags could live in
task_struct or something. Anyway, I'll keep reading...

> +/**
> + * syscall_enter_from_user_mode - Check and handle work before invoking
> + *				 a syscall
> + * @regs:	Pointer to currents pt_regs
> + * @syscall:	The syscall number
> + *
> + * Invoked from architecture specific syscall entry code with interrupts
> + * disabled. The calling code has to be non-instrumentable. When the
> + * function returns all state is correct and the subsequent functions can be
> + * instrumented.
> + *
> + * Returns: The original or a modified syscall number
> + *
> + * If the returned syscall number is -1 then the syscall should be
> + * skipped. In this case the caller may invoke syscall_set_error() or
> + * syscall_set_return_value() first.  If neither of those are called and -1
> + * is returned, then the syscall will fail with ENOSYS.

There's been some recent confusion over "has the syscall changed,
or did seccomp request it be skipped?" that was explored in arm64[2]
(though I see Will and Keno in CC already). There might need to be a
clearer way to distinguish between "wild userspace issued a -1 syscall"
and "seccomp or ptrace asked for the syscall to be skipped". The
difference is mostly about when ENOSYS gets set, with respect to calls
to syscall_set_return_value(), but if the syscall gets changed, the arch
may need to recheck the value and consider ENOSYS, etc. IIUC, what Will
ended up with[3] was having syscall_trace_enter() return the syscall return
value instead of the new syscall.

[2] https://lore.kernel.org/lkml/20200704125027.GB21185@willie-the-truck/
[3] https://lore.kernel.org/lkml/20200703083914.GA18516@willie-the-truck/

> +static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> +				unsigned long ti_work)
> +{
> +	long ret = 0;
> +
> +	/* Handle ptrace */
> +	if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
> +		ret = arch_syscall_enter_tracehook(regs);
> +		if (ret || (ti_work & _TIF_SYSCALL_EMU))
> +			return -1L;
> +	}
> +
> +	/* Do seccomp after ptrace, to catch any tracer changes. */
> +	if (ti_work & _TIF_SECCOMP) {
> +		ret = arch_syscall_enter_seccomp(regs);
> +		if (ret == -1L)
> +			return ret;
> +	}
> +
> +	if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
> +		trace_sys_enter(regs, syscall);
> +
> +	arch_syscall_enter_audit(regs);
> +
> +	return ret ? : syscall;
> +}

Modulo the notes about -1 vs syscall number above, this looks correct to
me for ptrace and seccomp.

-- 
Kees Cook

next prev parent reply	other threads:[~2020-07-16 20:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-16 18:22 [patch V3 00/13] entry, x86, kvm: Generic entry/exit functionality for host and guest Thomas Gleixner
2020-07-16 18:22 ` [patch V3 01/13] entry: Provide generic syscall entry functionality Thomas Gleixner
2020-07-16 20:52   ` Kees Cook [this message]
2020-07-16 21:55     ` Thomas Gleixner
2020-07-17 17:49       ` Kees Cook
2020-07-17 19:29         ` Thomas Gleixner
2020-07-17 21:56           ` Andy Lutomirski
2020-07-18 14:16             ` Thomas Gleixner
2020-07-18 14:41               ` Andy Lutomirski
2020-07-19 10:17                 ` Thomas Gleixner
2020-07-19 10:17                   ` Thomas Gleixner
2020-07-19 10:17                   ` Thomas Gleixner
2020-07-19 15:25                   ` Andy Lutomirski
2020-07-20  6:50                     ` Thomas Gleixner
2020-07-27 22:28   ` Andy Lutomirski
2020-07-16 18:22 ` [patch V3 02/13] entry: Provide generic syscall exit function Thomas Gleixner
2020-07-16 20:55   ` Kees Cook
2020-07-16 21:28     ` Thomas Gleixner
2020-07-16 18:22 ` [patch V3 03/13] entry: Provide generic interrupt entry/exit code Thomas Gleixner
2020-07-16 18:22 ` [patch V3 04/13] entry: Provide infrastructure for work before exiting to guest mode Thomas Gleixner
2020-07-16 18:22 ` [patch V3 05/13] x86/entry: Consolidate check_user_regs() Thomas Gleixner
2020-07-16 20:56   ` Kees Cook
2020-07-16 18:22 ` [patch V3 06/13] x86/entry: Consolidate 32/64 bit syscall entry Thomas Gleixner
2020-07-16 18:22 ` [patch V3 07/13] x86/ptrace: Provide pt_regs helpers for entry/exit Thomas Gleixner
2020-07-16 20:57   ` Kees Cook
2020-07-16 18:22 ` [patch V3 08/13] x86/entry: Use generic syscall entry function Thomas Gleixner
2020-07-16 21:13   ` Kees Cook
2020-07-16 21:33     ` Thomas Gleixner
2020-07-16 18:22 ` [patch V3 09/13] x86/entry: Use generic syscall exit functionality Thomas Gleixner
2020-07-16 18:22 ` [patch V3 10/13] x86/entry: Cleanup idtentry_entry/exit_user Thomas Gleixner
2020-07-16 18:22 ` [patch V3 11/13] x86/entry: Use generic interrupt entry/exit code Thomas Gleixner
2020-07-16 18:22 ` [patch V3 12/13] x86/entry: Cleanup idtentry_enter/exit Thomas Gleixner
2020-07-16 18:22 ` [patch V3 13/13] x86/kvm: Use generic exit to guest work function Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202007161336.B993ED938@keescook \
    --to=keescook@chromium.org \
    --cc=arnd@arndb.de \
    --cc=keno@juliacomputing.com \
    --cc=krisman@collabora.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.