* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)
[not found] ` <5019BC0E.4010109@zytor.com>
@ 2012-08-02 6:55 ` Al Viro
2012-08-02 9:14 ` Meredydd Luff
0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2012-08-02 6:55 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Meredydd Luff, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike,
Richard Weinberger, Andrew Morton, linux-arch
On Wed, Aug 01, 2012 at 04:30:22PM -0700, H. Peter Anvin wrote:
> On 08/01/2012 04:09 PM, Meredydd Luff wrote:
> >>> #
> >>> # x32-specific system call numbers start at 512 to avoid cache impact
> >>
> >> I think that should be common, not 64 (as should kcmp be).
> >
> > I copied the original execve, which is 64.
> >
>
> Sorry, you're right. The argument vector needs compatibility support.
>
> This means you need an x32 version of the function -- execve
> unfortunately is one of the few system calls which require a special x32
> version (although it's a simple wrapper around sys32_execve). See
> sys_x32_execve.
I *really* strongly object to doing that thing before we sanitize the
situation with sys_execve(). As it is, the damn thing is defined
separately on each architecture, with spectaculary ugly kludges used
in these implementations. Adding a parallel pile of kludges (and
due to their nature, they'll need to be changed in non-trivial
way in a lot of cases) is simply wrong.
The thing is, there's essentially no reason to have more than one
implementation. What they are (badly) doing is "we need to find
pt_regs to pass to do_execve(), the thing we are after has to be near
our stack frame, so let's try to get to it that way". With really
ugly set of kludges trying to do just that.
What we should use instead is task_pt_regs(); maybe introduce
current_pt_regs(), defaulting to task_pt_regs(current) and letting
architectures that can do it better (on some it's simply
available in dedicated register, on some it's better to work
from current_thread_info(), etc.) override the default.
With that we have a fairly good chance to merge most of those
guys; probably not all of them, due to e.g. mips weirdness,
but enough to make it worth doing.
The obstacle is in lazy kernel_execve() implementations;
ones that simply issue a trap/whatever is used to enter
the system call. Directly from kernel space. It doesn't
have to be done that way; see what e.g. arm does there.
Note that doing it without syscall instruction avoids another
headache; namely, we don't have to worry about returning
from *failed* execve (i.e. return to kernel mode) through
the codepath that is normally taken only when returning
to userland.
FWIW, I would try to pull the asm tail of arm kernel_execve()
into something that would look to C side as
ret_from_kernel_exec(®s); /* never returns */
and start converting architectures to that primitive. It should
copy the provided pt_regs to normal location (keeping in mind
that there really might be an overlap), set registers (including
stack pointer) for normal return to user path and jump there.
Essentially, that's the real arch-dependent part of kernel_execve() -
transition from kernel thread to userland process.
It can be done architecture-by-architecture; there's no need to make
it a flagday conversion. Once an arch is handled, we define
something like __ARCH_HAS_RET_FROM_KERNEL_EXEC and get the common
implementations of kernel_execve() and sys_execve() for that -
those could simply live in fs/exec.c under the matching ifdef.
Along with your sys_execveat(). I can probably throw alpha,
arm and x86 conversions into the pile, but it really needs to
be handled on linux-arch, with arch maintainers at least agreeing
in principle with that scheme.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)
2012-08-02 6:55 ` [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) Al Viro
@ 2012-08-02 9:14 ` Meredydd Luff
2012-08-02 10:30 ` Al Viro
0 siblings, 1 reply; 4+ messages in thread
From: Meredydd Luff @ 2012-08-02 9:14 UTC (permalink / raw)
To: Al Viro
Cc: H. Peter Anvin, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike,
Richard Weinberger, Andrew Morton, linux-arch
On Thu, Aug 2, 2012 at 7:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> This means you need an x32 version of the function -- execve
>> unfortunately is one of the few system calls which require a special x32
>> version (although it's a simple wrapper around sys32_execve). See
>> sys_x32_execve.
>
> I *really* strongly object to doing that thing before we sanitize the
> situation with sys_execve().
"That thing" = "creating an x32 entry stub", or "merging execveat() at all"?
(snip)
> The thing is, there's essentially no reason to have more than one
> implementation. What they are (badly) doing is "we need to find
> pt_regs to pass to do_execve(), the thing we are after has to be near
> our stack frame, so let's try to get to it that way".
Hang on...it's not just sys_execve that fits that description, is it?
You seem to be describing every call that needs a pt_regs parameter,
which at a glance is anything with a stub_ or PTREGSCALL in
arch/x86/kernel/entry_{32,64}.S. That's: clone, fork, vfork,
sigaltstack, iopl, execve, sigreturn, rt_sigreturn, vm86, vm86old.
Most of those are handled by a common PTREGSCALL macro, but there are
a few that get special treatment (different set on each arch - on
x86-64 it's execve and rt_sigreturn ; on i386 it's just clone).
Is there's something special about execve in particular, or do you
want to overhaul all the ptregscalls?
Meredydd
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)
2012-08-02 9:14 ` Meredydd Luff
@ 2012-08-02 10:30 ` Al Viro
2012-08-02 10:51 ` Meredydd Luff
0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2012-08-02 10:30 UTC (permalink / raw)
To: Meredydd Luff
Cc: H. Peter Anvin, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike,
Richard Weinberger, Andrew Morton, linux-arch
On Thu, Aug 02, 2012 at 10:14:53AM +0100, Meredydd Luff wrote:
> On Thu, Aug 2, 2012 at 7:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> This means you need an x32 version of the function -- execve
> >> unfortunately is one of the few system calls which require a special x32
> >> version (although it's a simple wrapper around sys32_execve). See
> >> sys_x32_execve.
> >
> > I *really* strongly object to doing that thing before we sanitize the
> > situation with sys_execve().
>
> "That thing" = "creating an x32 entry stub", or "merging execveat() at all"?
>
> (snip)
> > The thing is, there's essentially no reason to have more than one
> > implementation. What they are (badly) doing is "we need to find
> > pt_regs to pass to do_execve(), the thing we are after has to be near
> > our stack frame, so let's try to get to it that way".
>
> Hang on...it's not just sys_execve that fits that description, is it?
> You seem to be describing every call that needs a pt_regs parameter,
> which at a glance is anything with a stub_ or PTREGSCALL in
> arch/x86/kernel/entry_{32,64}.S. That's: clone, fork, vfork,
> sigaltstack, iopl, execve, sigreturn, rt_sigreturn, vm86, vm86old.
> Most of those are handled by a common PTREGSCALL macro, but there are
> a few that get special treatment (different set on each arch - on
> x86-64 it's execve and rt_sigreturn ; on i386 it's just clone).
>
> Is there's something special about execve in particular, or do you
> want to overhaul all the ptregscalls?
You are looking at that from the wrong side; it's not that this
set of syscalls on x86 has the same kind of wrapper - it's that
on different architectures the kludges are seriously different
and fairly brittle. sigaltstack(), BTW, doesn't need to be
arch-specific at all - if you check what its pt_regs argument
is used for, we just need something like current_user_stack_pointer()
and that's it. It's a different patch series, anyway.
There's a difference between "here's the syscall implementation,
here's its hookup for x86, let other architectures update their
syscall tables" and "here's the implementation in arch/x86 and its
hookup for x86; let other architectures port it in whatever way
they need". The latter is a recipe for breakage - we'd been
through that kind of story quite a few times.
Out of that set, vm86/vm86old/iopl are genuine x86-only syscalls.
sigreturn and rt_sigreturn are weird, subtle and, sadly,
unmergable - syscall restart prevention logics is different
enough to make it not feasible at the moment. It's a serious
source of PITA, BTW - quite a few of those had the same bugs,
years after they'd been discovered and fixed on a subset of
architectures. Been there, fixed some of those, the latest
batch this April... fork/vfork/clone is an interesting
question - IIRC, there are seriously subtle issues of some
sort on sparc, but I don't remember details right now.
Really, go and grep for do_execve; most of the callers will
be in execve(2) implementations. Look at them carefully -
while some get pt_regs from some wrapper, there's a bunch
that does that manually in C. "Ugly" doesn't even begin
to describe what's being done...
FWIW, I've just pushed (completely untested) arm and alpha
parts of what I described into signal.git#execve2; x86 is
next. Note that after that sys_execve() is identical on
converted architectures and can be merged; ditto for
kernel_execve(). After I do x86 counterpart, I'll
take those guys to fs/exec.c under ifdef for new __ARCH_HAS_...
(and define it on already converted ones, obviously).
Then your patch goes there, except that implementation
gets put into fs/exec.c, under the same ifdef. And with
current_pt_regs() used instead of the extra argument,
of course. From that point on it can be used on any converted
architecture. And conversion consists of
* providing ret_from_kernel_execve() that would
put pt_regs where they belong and return to userland.
* providing current_pt_regs(), with default being
just task_pt_regs(current).
* defining that new __ARCH_HAS_...
* removing sys_execve()/kernel_execve() implementations;
the ones in fs/exec.c will work just fine.
Can be done at leisure, architecture by architecture. It's
obviously the next cycle fodder - we have only a couple of
days left of this merge window, anyway.
I really think that this pair of primitives is the right
way to deal with execve mess.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)
2012-08-02 10:30 ` Al Viro
@ 2012-08-02 10:51 ` Meredydd Luff
0 siblings, 0 replies; 4+ messages in thread
From: Meredydd Luff @ 2012-08-02 10:51 UTC (permalink / raw)
To: Al Viro
Cc: H. Peter Anvin, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike,
Richard Weinberger, Andrew Morton, linux-arch
On Thu, Aug 2, 2012 at 11:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> FWIW, I've just pushed (completely untested) arm and alpha
> parts of what I described into signal.git#execve2; x86 is
> next. Note that after that sys_execve() is identical on
> converted architectures and can be merged; ditto for
> kernel_execve(). After I do x86 counterpart, I'll
> take those guys to fs/exec.c under ifdef for new __ARCH_HAS_...
> (and define it on already converted ones, obviously).
> Then your patch goes there, except that implementation
> gets put into fs/exec.c, under the same ifdef. And with
> current_pt_regs() used instead of the extra argument,
> of course. From that point on it can be used on any converted
> architecture.
OK, that makes sense to me.
What would you need from me, and when? Should I just wait for your
#ifdef-ed sys_execve() in fs/exec.c, and re-spin the patch based on
that?
Is there anything else I can/should do in the meantime to make this
patch acceptable?
Meredydd
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-02 10:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1343859049-3632-1-git-send-email-meredydd@senatehouse.org>
[not found] ` <5019B36A.4030604@zytor.com>
[not found] ` <CAD=T17Esguer01OsrxMtqEcHKZ-Ovg05+MxSdF7NhXENsUubgg@mail.gmail.com>
[not found] ` <5019BC0E.4010109@zytor.com>
2012-08-02 6:55 ` [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) Al Viro
2012-08-02 9:14 ` Meredydd Luff
2012-08-02 10:30 ` Al Viro
2012-08-02 10:51 ` Meredydd Luff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox