Re: [PATCH] mm_release: Do a set_fs(USER_DS) before handling clear_child_tid.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nelson Elhage <nelhage@ksplice.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm_release: Do a set_fs(USER_DS) before handling clear_child_tid.
Date: Tue, 30 Nov 2010 19:59:09 -0500	[thread overview]
Message-ID: <20101201005909.GC18995@ksplice.com> (raw)
In-Reply-To: <20101130160950.96153286.akpm@linux-foundation.org>

On Tue, Nov 30, 2010 at 04:09:50PM -0800, Andrew Morton wrote:
> On Mon, 29 Nov 2010 21:19:16 -0500
> Nelson Elhage <nelhage@ksplice.com> wrote:
> 
> > If a user manages to trigger a kernel BUG() or page fault with fs set to
> > KERNEL_DS, fs is not otherwise reset before do_exit(), allowing the user to
> > write a 0 to an arbitrary address in kernel memory.
> > 
> > Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
> > ---
> > AFAICT this is presently only triggerable in the presence of another bug, but
> > this potentially turns a lot of DoS bugs into privilege escalation, so it's
> > worth fixing. Among other things, sock_no_sendpage and the kernel_{read,write}v
> > calls in splice.c make it easy to call an awful lot of the kernel under
> > KERNEL_DS.
> > 
> > This isn't the only way we could fix this -- we could put the set_fs() at the
> > start of do_exit, or in all the callers that might call potentially do_exit with
> > KERNEL_DS set, or else we could do an access_ok inside fork(). I'm happy to put
> > together one of those patches if someone thinks another approach makes more
> > sense.
> > 
> >  kernel/fork.c |    5 +++++
> >  1 files changed, 5 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 3b159c5..a68445e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -636,7 +636,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> >  			/*
> >  			 * We don't check the error code - if userspace has
> >  			 * not set up a proper pointer then tough luck.
> > +			 *
> > +			 * We do set_fs() explicitly in case this task
> > +			 * exited while inside set_fs(KERNEL_DS) for
> > +			 * some reason (e.g. on a BUG()).
> >  			 */
> > +			set_fs(USER_DS);
> >  			put_user(0, tsk->clear_child_tid);
> >  			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
> >  					1, NULL, NULL, 0);
> 
> Confused.  The user can only exploit the wrong addr_limit if control
> returns to userspace for the user's code to execute.  But that won't be
> happening, because this thread will unconditionally exit.

The user can exploit the wrong addr_limit on the very next line, with the
put_user() there. clear_child_tid is not checked in any way before this
point. Writing a single zero might not seem like much, but it's enough for
privilege escalation (e.g. overwrite the top half of a function pointer to point
to userspace).

I have a PoC code that uses this bug, along with CVE-2010-3849, to write a zero
to an arbitrary kernel address, so I've tested that this is not theoretical.

That's also why I put the set_fs() hidden inside mm_release, since that's the
only place where (to my knowledge) it matters.

On re-reading, I didn't mention clear_child_tid anywhere in the commit message,
which was an error on my part, and explains the confusion. Sorry about that, and
I hope this clears that up.

Let me know if this makes more sense, and I'll send a revised patch.

- Nelson

> 
> 
> If/when you unconfuse me, I'd suggest this change only be done if the
> thread is *known* to have oopsed - doing it for non-oopsed threads
> seems unpleasant to my mind.  And I think it should be done nice and
> clearly, right up inside do_exit() by some means.  Or perhaps in the
> oops code, just before it calls do_exit().  Not hidden down in
> mm_release().

next prev parent reply	other threads:[~2010-12-01  0:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-30  2:19 [PATCH] mm_release: Do a set_fs(USER_DS) before handling clear_child_tid Nelson Elhage
2010-12-01  0:09 ` Andrew Morton
2010-12-01  0:59   ` Nelson Elhage [this message]
2010-12-01  1:49     ` Andrew Morton
2010-12-01  2:27       ` [PATCH v2] do_exit(): Make sure we run with get_fs() == USER_DS Nelson Elhage
2010-12-01  2:50         ` KOSAKI Motohiro
2010-12-02  0:30           ` Andrew Morton
2010-12-02  0:48             ` KOSAKI Motohiro
2010-12-02  1:12         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101201005909.GC18995@ksplice.com \
    --to=nelhage@ksplice.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.