* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) [not found] ` <5019BC0E.4010109@zytor.com> @ 2012-08-02 6:55 ` Al Viro 2012-08-02 9:14 ` Meredydd Luff 0 siblings, 1 reply; 4+ messages in thread From: Al Viro @ 2012-08-02 6:55 UTC (permalink / raw) To: H. Peter Anvin Cc: Meredydd Luff, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike, Richard Weinberger, Andrew Morton, linux-arch On Wed, Aug 01, 2012 at 04:30:22PM -0700, H. Peter Anvin wrote: > On 08/01/2012 04:09 PM, Meredydd Luff wrote: > >>> # > >>> # x32-specific system call numbers start at 512 to avoid cache impact > >> > >> I think that should be common, not 64 (as should kcmp be). > > > > I copied the original execve, which is 64. > > > > Sorry, you're right. The argument vector needs compatibility support. > > This means you need an x32 version of the function -- execve > unfortunately is one of the few system calls which require a special x32 > version (although it's a simple wrapper around sys32_execve). See > sys_x32_execve. I *really* strongly object to doing that thing before we sanitize the situation with sys_execve(). As it is, the damn thing is defined separately on each architecture, with spectaculary ugly kludges used in these implementations. Adding a parallel pile of kludges (and due to their nature, they'll need to be changed in non-trivial way in a lot of cases) is simply wrong. The thing is, there's essentially no reason to have more than one implementation. What they are (badly) doing is "we need to find pt_regs to pass to do_execve(), the thing we are after has to be near our stack frame, so let's try to get to it that way". With really ugly set of kludges trying to do just that. What we should use instead is task_pt_regs(); maybe introduce current_pt_regs(), defaulting to task_pt_regs(current) and letting architectures that can do it better (on some it's simply available in dedicated register, on some it's better to work from current_thread_info(), etc.) override the default. With that we have a fairly good chance to merge most of those guys; probably not all of them, due to e.g. mips weirdness, but enough to make it worth doing. The obstacle is in lazy kernel_execve() implementations; ones that simply issue a trap/whatever is used to enter the system call. Directly from kernel space. It doesn't have to be done that way; see what e.g. arm does there. Note that doing it without syscall instruction avoids another headache; namely, we don't have to worry about returning from *failed* execve (i.e. return to kernel mode) through the codepath that is normally taken only when returning to userland. FWIW, I would try to pull the asm tail of arm kernel_execve() into something that would look to C side as ret_from_kernel_exec(®s); /* never returns */ and start converting architectures to that primitive. It should copy the provided pt_regs to normal location (keeping in mind that there really might be an overlap), set registers (including stack pointer) for normal return to user path and jump there. Essentially, that's the real arch-dependent part of kernel_execve() - transition from kernel thread to userland process. It can be done architecture-by-architecture; there's no need to make it a flagday conversion. Once an arch is handled, we define something like __ARCH_HAS_RET_FROM_KERNEL_EXEC and get the common implementations of kernel_execve() and sys_execve() for that - those could simply live in fs/exec.c under the matching ifdef. Along with your sys_execveat(). I can probably throw alpha, arm and x86 conversions into the pile, but it really needs to be handled on linux-arch, with arch maintainers at least agreeing in principle with that scheme. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) 2012-08-02 6:55 ` [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) Al Viro @ 2012-08-02 9:14 ` Meredydd Luff 2012-08-02 10:30 ` Al Viro 0 siblings, 1 reply; 4+ messages in thread From: Meredydd Luff @ 2012-08-02 9:14 UTC (permalink / raw) To: Al Viro Cc: H. Peter Anvin, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike, Richard Weinberger, Andrew Morton, linux-arch On Thu, Aug 2, 2012 at 7:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> This means you need an x32 version of the function -- execve >> unfortunately is one of the few system calls which require a special x32 >> version (although it's a simple wrapper around sys32_execve). See >> sys_x32_execve. > > I *really* strongly object to doing that thing before we sanitize the > situation with sys_execve(). "That thing" = "creating an x32 entry stub", or "merging execveat() at all"? (snip) > The thing is, there's essentially no reason to have more than one > implementation. What they are (badly) doing is "we need to find > pt_regs to pass to do_execve(), the thing we are after has to be near > our stack frame, so let's try to get to it that way". Hang on...it's not just sys_execve that fits that description, is it? You seem to be describing every call that needs a pt_regs parameter, which at a glance is anything with a stub_ or PTREGSCALL in arch/x86/kernel/entry_{32,64}.S. That's: clone, fork, vfork, sigaltstack, iopl, execve, sigreturn, rt_sigreturn, vm86, vm86old. Most of those are handled by a common PTREGSCALL macro, but there are a few that get special treatment (different set on each arch - on x86-64 it's execve and rt_sigreturn ; on i386 it's just clone). Is there's something special about execve in particular, or do you want to overhaul all the ptregscalls? Meredydd ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) 2012-08-02 9:14 ` Meredydd Luff @ 2012-08-02 10:30 ` Al Viro 2012-08-02 10:51 ` Meredydd Luff 0 siblings, 1 reply; 4+ messages in thread From: Al Viro @ 2012-08-02 10:30 UTC (permalink / raw) To: Meredydd Luff Cc: H. Peter Anvin, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike, Richard Weinberger, Andrew Morton, linux-arch On Thu, Aug 02, 2012 at 10:14:53AM +0100, Meredydd Luff wrote: > On Thu, Aug 2, 2012 at 7:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > >> This means you need an x32 version of the function -- execve > >> unfortunately is one of the few system calls which require a special x32 > >> version (although it's a simple wrapper around sys32_execve). See > >> sys_x32_execve. > > > > I *really* strongly object to doing that thing before we sanitize the > > situation with sys_execve(). > > "That thing" = "creating an x32 entry stub", or "merging execveat() at all"? > > (snip) > > The thing is, there's essentially no reason to have more than one > > implementation. What they are (badly) doing is "we need to find > > pt_regs to pass to do_execve(), the thing we are after has to be near > > our stack frame, so let's try to get to it that way". > > Hang on...it's not just sys_execve that fits that description, is it? > You seem to be describing every call that needs a pt_regs parameter, > which at a glance is anything with a stub_ or PTREGSCALL in > arch/x86/kernel/entry_{32,64}.S. That's: clone, fork, vfork, > sigaltstack, iopl, execve, sigreturn, rt_sigreturn, vm86, vm86old. > Most of those are handled by a common PTREGSCALL macro, but there are > a few that get special treatment (different set on each arch - on > x86-64 it's execve and rt_sigreturn ; on i386 it's just clone). > > Is there's something special about execve in particular, or do you > want to overhaul all the ptregscalls? You are looking at that from the wrong side; it's not that this set of syscalls on x86 has the same kind of wrapper - it's that on different architectures the kludges are seriously different and fairly brittle. sigaltstack(), BTW, doesn't need to be arch-specific at all - if you check what its pt_regs argument is used for, we just need something like current_user_stack_pointer() and that's it. It's a different patch series, anyway. There's a difference between "here's the syscall implementation, here's its hookup for x86, let other architectures update their syscall tables" and "here's the implementation in arch/x86 and its hookup for x86; let other architectures port it in whatever way they need". The latter is a recipe for breakage - we'd been through that kind of story quite a few times. Out of that set, vm86/vm86old/iopl are genuine x86-only syscalls. sigreturn and rt_sigreturn are weird, subtle and, sadly, unmergable - syscall restart prevention logics is different enough to make it not feasible at the moment. It's a serious source of PITA, BTW - quite a few of those had the same bugs, years after they'd been discovered and fixed on a subset of architectures. Been there, fixed some of those, the latest batch this April... fork/vfork/clone is an interesting question - IIRC, there are seriously subtle issues of some sort on sparc, but I don't remember details right now. Really, go and grep for do_execve; most of the callers will be in execve(2) implementations. Look at them carefully - while some get pt_regs from some wrapper, there's a bunch that does that manually in C. "Ugly" doesn't even begin to describe what's being done... FWIW, I've just pushed (completely untested) arm and alpha parts of what I described into signal.git#execve2; x86 is next. Note that after that sys_execve() is identical on converted architectures and can be merged; ditto for kernel_execve(). After I do x86 counterpart, I'll take those guys to fs/exec.c under ifdef for new __ARCH_HAS_... (and define it on already converted ones, obviously). Then your patch goes there, except that implementation gets put into fs/exec.c, under the same ifdef. And with current_pt_regs() used instead of the extra argument, of course. From that point on it can be used on any converted architecture. And conversion consists of * providing ret_from_kernel_execve() that would put pt_regs where they belong and return to userland. * providing current_pt_regs(), with default being just task_pt_regs(current). * defining that new __ARCH_HAS_... * removing sys_execve()/kernel_execve() implementations; the ones in fs/exec.c will work just fine. Can be done at leisure, architecture by architecture. It's obviously the next cycle fodder - we have only a couple of days left of this merge window, anyway. I really think that this pair of primitives is the right way to deal with execve mess. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) 2012-08-02 10:30 ` Al Viro @ 2012-08-02 10:51 ` Meredydd Luff 0 siblings, 0 replies; 4+ messages in thread From: Meredydd Luff @ 2012-08-02 10:51 UTC (permalink / raw) To: Al Viro Cc: H. Peter Anvin, linux-kernel, Kees Cook, Ingo Molnar, Jeff Dike, Richard Weinberger, Andrew Morton, linux-arch On Thu, Aug 2, 2012 at 11:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > FWIW, I've just pushed (completely untested) arm and alpha > parts of what I described into signal.git#execve2; x86 is > next. Note that after that sys_execve() is identical on > converted architectures and can be merged; ditto for > kernel_execve(). After I do x86 counterpart, I'll > take those guys to fs/exec.c under ifdef for new __ARCH_HAS_... > (and define it on already converted ones, obviously). > Then your patch goes there, except that implementation > gets put into fs/exec.c, under the same ifdef. And with > current_pt_regs() used instead of the extra argument, > of course. From that point on it can be used on any converted > architecture. OK, that makes sense to me. What would you need from me, and when? Should I just wait for your #ifdef-ed sys_execve() in fs/exec.c, and re-spin the patch based on that? Is there anything else I can/should do in the meantime to make this patch acceptable? Meredydd ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-02 10:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1343859049-3632-1-git-send-email-meredydd@senatehouse.org>
[not found] ` <5019B36A.4030604@zytor.com>
[not found] ` <CAD=T17Esguer01OsrxMtqEcHKZ-Ovg05+MxSdF7NhXENsUubgg@mail.gmail.com>
[not found] ` <5019BC0E.4010109@zytor.com>
2012-08-02 6:55 ` [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2) Al Viro
2012-08-02 9:14 ` Meredydd Luff
2012-08-02 10:30 ` Al Viro
2012-08-02 10:51 ` Meredydd Luff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox