Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Meredydd Luff <meredydd@senatehouse.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, Kees Cook <keescook@chromium.org>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Richard Weinberger <richard@nod.at>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)
Date: Thu, 2 Aug 2012 11:30:02 +0100	[thread overview]
Message-ID: <20120802103001.GJ6481@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAD=T17Gkryua5TQW5h18kCu+yiViuVv6NyxtMw3C9we0RWzmKQ@mail.gmail.com>

On Thu, Aug 02, 2012 at 10:14:53AM +0100, Meredydd Luff wrote:
> On Thu, Aug 2, 2012 at 7:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> This means you need an x32 version of the function -- execve
> >> unfortunately is one of the few system calls which require a special x32
> >> version (although it's a simple wrapper around sys32_execve).  See
> >> sys_x32_execve.
> >
> > I *really* strongly object to doing that thing before we sanitize the
> > situation with sys_execve().
> 
> "That thing" = "creating an x32 entry stub", or "merging execveat() at all"?
> 
> (snip)
> > The thing is, there's essentially no reason to have more than one
> > implementation.  What they are (badly) doing is "we need to find
> > pt_regs to pass to do_execve(), the thing we are after has to be near
> > our stack frame, so let's try to get to it that way".
> 
> Hang on...it's not just sys_execve that fits that description, is it?
> You seem to be describing every call that needs a pt_regs parameter,
> which at a glance is anything with a stub_ or PTREGSCALL in
> arch/x86/kernel/entry_{32,64}.S. That's: clone, fork, vfork,
> sigaltstack, iopl, execve, sigreturn, rt_sigreturn, vm86, vm86old.
> Most of those are handled by a common PTREGSCALL macro, but there are
> a few that get special treatment (different set on each arch - on
> x86-64 it's execve and rt_sigreturn ; on i386 it's just clone).
> 
> Is there's something special about execve in particular, or do you
> want to overhaul all the ptregscalls?

You are looking at that from the wrong side; it's not that this
set of syscalls on x86 has the same kind of wrapper - it's that
on different architectures the kludges are seriously different
and fairly brittle.  sigaltstack(), BTW, doesn't need to be
arch-specific at all - if you check what its pt_regs argument
is used for, we just need something like current_user_stack_pointer()
and that's it.  It's a different patch series, anyway.

There's a difference between "here's the syscall implementation,
here's its hookup for x86, let other architectures update their
syscall tables" and "here's the implementation in arch/x86 and its
hookup for x86; let other architectures port it in whatever way
they need".  The latter is a recipe for breakage - we'd been
through that kind of story quite a few times.

Out of that set, vm86/vm86old/iopl are genuine x86-only syscalls.
sigreturn and rt_sigreturn are weird, subtle and, sadly,
unmergable - syscall restart prevention logics is different
enough to make it not feasible at the moment.  It's a serious
source of PITA, BTW - quite a few of those had the same bugs,
years after they'd been discovered and fixed on a subset of
architectures.  Been there, fixed some of those, the latest
batch this April...  fork/vfork/clone is an interesting
question - IIRC, there are seriously subtle issues of some
sort on sparc, but I don't remember details right now.

Really, go and grep for do_execve; most of the callers will
be in execve(2) implementations.  Look at them carefully -
while some get pt_regs from some wrapper, there's a bunch
that does that manually in C.  "Ugly" doesn't even begin
to describe what's being done...

FWIW, I've just pushed (completely untested) arm and alpha
parts of what I described into signal.git#execve2; x86 is
next.  Note that after that sys_execve() is identical on
converted architectures and can be merged; ditto for
kernel_execve().  After I do x86 counterpart, I'll
take those guys to fs/exec.c under ifdef for new __ARCH_HAS_...
(and define it on already converted ones, obviously).
Then your patch goes there, except that implementation
gets put into fs/exec.c, under the same ifdef.  And with
current_pt_regs() used instead of the extra argument,
of course.  From that point on it can be used on any converted
architecture.  And conversion consists of
	* providing ret_from_kernel_execve() that would
put pt_regs where they belong and return to userland.
	* providing current_pt_regs(), with default being
just task_pt_regs(current).
	* defining that new __ARCH_HAS_...
	* removing sys_execve()/kernel_execve() implementations;
the ones in fs/exec.c will work just fine.
Can be done at leisure, architecture by architecture.  It's
obviously the next cycle fodder - we have only a couple of
days left of this merge window, anyway.

I really think that this pair of primitives is the right
way to deal with execve mess.

next prev parent reply	other threads:[~2012-08-02 10:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-01 22:10 [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) Meredydd Luff
2012-08-01 22:53 ` H. Peter Anvin
2012-08-01 23:09   ` Meredydd Luff
2012-08-01 23:30     ` H. Peter Anvin
2012-08-01 23:32       ` H. Peter Anvin
     [not found]         ` <CAD=T17GaEMU_PDDvaJ9GVRUtzTZ069AN2yTuX6=rVo6jNO+6oA@mail.gmail.com>
2012-08-02  0:52           ` H. Peter Anvin
2012-08-02  6:55       ` Al Viro
2012-08-02  9:14         ` Meredydd Luff
2012-08-02 10:30           ` Al Viro [this message]
2012-08-02 10:51             ` Meredydd Luff
2012-08-09 19:19 ` Andy Lutomirski
2012-08-09 20:29   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120802103001.GJ6481@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=jdike@addtoit.com \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=meredydd@senatehouse.org \
    --cc=mingo@redhat.com \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox