linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Dave Watson <davejwatson@fb.com>, Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andi Kleen <andi@firstfloor.org>,
	Christoph Lameter <cl@linux.com>, Ben Maurer <bmaurer@fb.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will
Subject: Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE
Date: Mon, 2 Jul 2018 10:11:13 -0700	[thread overview]
Message-ID: <CALCETrVeu_V-3K8OfKv9QixrWBJs+q866yb=e96sC76SS4hxYg@mail.gmail.com> (raw)
In-Reply-To: <1527399163.10673.1530541966296.JavaMail.zimbra@efficios.com>

On Mon, Jul 2, 2018 at 7:32 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Jun 29, 2018, at 4:39 PM, Andy Lutomirski luto@amacapital.net wrote:
>
> > On Fri, Jun 29, 2018 at 12:48 PM, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >> There are two aspects I'm concerned about here:
> >>
> >> 1) security: we don't want 32-bit user-space to feed a 64-bit value over 4GB
> >>    as abort_ip that may end up causing OOPSes on architectures that would
> >>    lack proper validation of those values on return to userspace.
> >
> > I'm not too worried about this.  As long as you're doing it from
> > signal-delivery context (which you are AFAICT) you're fine.
>
> No, it's not just signal-delivery context. It's _also_ called from
> return to usermode loop, which can by called on return from
> interrupt/trap/syscall.
>

TIF_NOTIFY_RESUME context in the exit slowpath is fine, too.

> >
> > But I re-read the code and I think I have a really straightforward
> > solution.  Two choices:
> >
> > (1) Change instruction_pointer_set() to return an error code if the
> > address passed in is garbage in a way that could cause unexpected
> > behavior (like >=2^32 on x86_64 if regs->cs is 32-bit).  It has very
> > very few callers.
>
> This would take care of my security concern wrt abort_ip, but would not
> provide consistent behavior for the other fields. Also, perhaps this
> kind of change should aim the next merge window ?

It's not about security.  The idea is that instruction_pointer_set()
should return some indication of whether it actually set the
instruction pointer to the requested value.  On x86, if you have
!user_64bit_mode(regs) and you call instruction_pointer_set() to set
ip to 0xbaadc0de12345678, then you end up with a state where we will
probably execute user code at the address 0x12345678.  Conversely, if
you have user_64bit_mode(regs) == true and you set ip to
0xbaadc0de12345678, then you will end up sending a signal to the task
because 0xbaadc0de12345678 is not executable (and, in fact, is highly
likely to be noncanonical).

So I would argue that the semantics *should* be:

/*
 * Attempts to modify @regs such that the next user instruction to be
executed is
 * the instruction at @addr.  instruction_pointer_set() may return
false to indicate
 * that addr was invalid in the sense that the next user instruction executed
 * might be some other address instead.  The most likely cause is that
 * regs refers to a 32-bit compat context, addr != (u32)addr, and the
architecture
 * might silently truncate the address on the next return to user code.
 *
 * instruction_pointer_set() must only be called from a context in
which the architecture
 * allows arbitrary modifications of @regs.
 *
 * Architecture implementations promise that calling
instruction_pointer_set() will not
 * crash or otherwise corrupt the kernel when called from a valid
context, regardless
 * of what value is passed in @addr.
 */
bool instruction_pointer_set(struct pt_regs *regs, unsigned long addr);

>
> >
> > (2) Add instruction_pointer_validate() to go along with
> > instruction_pointer_set().
> >
> > That should be enough to solve the problem, right?
>
> This would only handle the "security" part of the matter, which
> is specifically related to rseq->rseq_cs->abort_ip.
>
> What is left is ensuring that we have consistent behavior for
> other fields:
>
> [ Note: we have introduced this helper macro: LINUX_FIELD_u32_u64
> which defines a field which is 64-bit for 64-bit processes, and 32-bit
> with 32-bit of padding for 32-bit processes. ]
>
> * rseq->rseq_cs: (userspace pointer to user-space, updated by user-space
>   with single-copy atomicity): current type: LINUX_FIELD_u32_u64,
>   cannot be changed to __u64 due to single-copy atomicity requirement,
>
> * rseq->rseq_cs->start_ip: currently a LINUX_FIELD_u32_u64,
>   could become a __u64,
>
> * rseq->rseq_cs->post_commit_ip: currently a LINUX_FIELD_u32_u64,
>   could become a __u64,
>
> * rseq->rseq_cs->abort_ip: currently a LINUX_FIELD_u32_u64,
>   could become a __u64,
>
> For abort_ip, changing the type to __u64 and using the
> instruction_pointer_validate() approach you propose would work.
>
> For start_ip and post_commit_ip, we need to decide whether we
> want to kill a 32-bit process setting the high bits or if we just
> accept and use the full __u64 content on both 32-bit and 64-bit
> kernels. Those two fields are only used for arithmetic comparison.
> Using the full __u64 content means using 64-bit arithmetic on
> 32-bit native kernels though.

Just use the 64-bit values, I think.  I see no point in killing the task.


>
> For rseq->rseq_cs, we cannot use __u64 due to single-copy atomicity
> update requirement for 32-bit processes. However, we are using this
> field in a copy_from_user(), so it will EFAULT if the high-bits are
> set by a compat 32-bit task on a 64-bit kernel. We can therefore check
> that the padding is zeroed explicitly on a native 32-bit kernel to
> provide a consistent behavior. Specifically because rseq->rseq_cs is
> checked with access_ok(), it is therefore enough to check the padding
> when __LP64__ is not defined by the preprocessor.

Agreed.

>
> But rather than trying to play games with input validation, I would
> favor an approach that would allow rseq to validate all its inputs
> straightforwardly. Introducing user_64bit_mode(struct pt_regs *)
> across all architectures would allow doing just that.

I would be okay with that, too, but I think it would have to be
user_64bit_mode(task, regs), since
sane architectures would have the task bitness somewhere other than in
regs.  x86 is IMO rather
weird in this regard.  When I added user_64bit_mode(), I didn't
envision its use outside x86 arch code.

> AFAIU this could be achieved by re-introducing is_compat_task() on x86 as:
>
> #ifdef CONFIG_COMPAT
> static bool is_compat_task(void)
> {
>         return user_64bit_mode(current_pt_regs()));
> }
> #else
> static bool is_compat_task(void) { return false; };
> #endif
>
> Or am I missing something ?

is_compat_task() historically literally meant "am I in a compat system
call".  It never worked consistently on x86 outside of syscall
context.  While I do have fundamental objections to having a generic
concept of "is this a compat task?" on Linux, that's not why I removed
is_compat_task().  I removed it because it didn't do what the name
suggested.

Unfortunately, while it's gone from generic code, it's still there on
non-x86 arches, and it probably still has inconsistent semantics.  So
I don't want to re-add it.

But I think that the limited solution of changing
instruction_pointer_set() really is a sufficient
architecture-dependent change to fully solve your problem.

  parent reply	other threads:[~2018-07-02 17:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 16:23 [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE Mathieu Desnoyers
2018-06-28 16:23 ` [RFC PATCH for 4.18 2/2] rseq: check that rseq->rseq_cs padding is zero Mathieu Desnoyers
2018-06-28 16:53   ` Will Deacon
2018-06-28 20:55     ` Mathieu Desnoyers
2018-06-28 20:22 ` [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE Andy Lutomirski
2018-06-28 20:56   ` Mathieu Desnoyers
2018-06-28 21:22   ` Linus Torvalds
2018-06-28 22:29     ` Mathieu Desnoyers
2018-06-28 23:29     ` Andy Lutomirski
2018-06-29  0:18       ` Linus Torvalds
2018-06-29  0:54         ` Mathieu Desnoyers
2018-06-29  1:08         ` Andy Lutomirski
2018-06-29 14:02           ` Linus Torvalds
2018-06-29 14:05             ` Mathieu Desnoyers
2018-06-29 14:17               ` Linus Torvalds
2018-06-29 15:03                 ` Mathieu Desnoyers
     [not found]                   ` <CA+55aFw==YnFJn7iGnKMW=RbPT74YHNa0QDF96mEdMPA2oX9SA@mail.gmail.com>
2018-06-29 15:54                     ` Linus Torvalds
2018-06-29 16:07                       ` Mathieu Desnoyers
2018-06-29 17:03                         ` Linus Torvalds
2018-06-29 19:48                           ` Mathieu Desnoyers
2018-06-29 20:39                             ` Andy Lutomirski
2018-07-02 14:32                               ` Mathieu Desnoyers
2018-07-02 16:04                                 ` Mathieu Desnoyers
2018-07-02 17:11                                 ` Andy Lutomirski [this message]
2018-07-02 19:00                                   ` Mathieu Desnoyers
2018-07-02 19:02                                     ` Andy Lutomirski
2018-07-02 19:31                                       ` Linus Torvalds
2018-07-02 20:12                                         ` Andy Lutomirski
2018-07-02 20:22                                           ` Linus Torvalds
2018-06-29 16:07                     ` Andy Lutomirski
2018-06-29 13:55       ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrVeu_V-3K8OfKv9QixrWBJs+q866yb=e96sC76SS4hxYg@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bmaurer@fb.com \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=davejwatson@fb.com \
    --cc=hpa@zytor.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).