linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image?
@ 2014-05-02  2:19 Steven Stewart-Gallus
       [not found] ` <faa2c0e82f8a8.536300c8-BTv7Ps/Sm75C8prJL3GQQw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Stewart-Gallus @ 2014-05-02  2:19 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA

Hello, can any kernel gurus please help me out with what happens on this weird
corner-case?


Quoting Rich Felker at http://ewontfix.com/14/:

> For common reasons it might fail, the execve syscall returns failure
> in the original process image, allowing the program to handle the
> error. However, failure of execve is not entirely atomic:
> 
>     The kernel may fail setting up the VM for the new process image
>     after the original VM has already been destroyed; the main
>     situation under which this would happen is resource exhaustion.
> 
>     Even after the kernel successfully sets up the new VM and
>     transfers execution to the new process image, it's possible to
>     have failures prior to the transfer of control to the actual
>     application program. This could happen in the dynamic linker
>     (resource exhaustion or other transient failures mapping required
>     libraries or loading configuration files) or libc startup
>     code. Using musl libc with static linking or even dynamic linking
>     with no additional libraries eliminates these failure cases, but
>     systemd is intended to be used with glibc.

execve is not atomic, what is the exit state of the process when
execve fails after throwing away the original process image?

I already have a hacky solution (see
https://gitorious.org/linted/linted/source/d25eac7a6fb58946ec8112771c0c56eb39fd1055:src/spawn/spawn.c)
to finding when a process fails before or at execve and now I want to
know what exit status is returned so I can give a nice error message.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image?
       [not found] ` <faa2c0e82f8a8.536300c8-BTv7Ps/Sm75C8prJL3GQQw@public.gmane.org>
@ 2014-05-03 17:45   ` Jann Horn
       [not found]     ` <20140503174510.GA7720-7cfQGs147y6a6lf8Wg2v7Z5kstrrjoWp@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jann Horn @ 2014-05-03 17:45 UTC (permalink / raw)
  To: Steven Stewart-Gallus; +Cc: linux-api-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 342 bytes --]

On Fri, May 02, 2014 at 02:19:52AM +0000, Steven Stewart-Gallus wrote:
> execve is not atomic, what is the exit state of the process when
> execve fails after throwing away the original process image?

See http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 or
so – as far as I know, the kernel sends a SIGKILL. Does that help?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image?
       [not found]     ` <20140503174510.GA7720-7cfQGs147y6a6lf8Wg2v7Z5kstrrjoWp@public.gmane.org>
@ 2014-05-03 22:18       ` Steven Stewart-Gallus
       [not found]         ` <faa0c9e9297ad.53656b4c-BTv7Ps/Sm75C8prJL3GQQw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Stewart-Gallus @ 2014-05-03 22:18 UTC (permalink / raw)
  To: Jann Horn; +Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Thank you Jann
Horn. http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740
answers my question.

On reflection, the kernel code makes sense. The process must either
exit with an error code or raise the SIGKILL signal because SIGKILL
and SIGSTOP are the only unblockable signals (of course, the kernel
has the privileges to do whatever it wants but it tries to be
consistent with userspace).

Strangely, in other places the SIGSEGV is sent when the ELF file is
incorrect in some places and I don't fully understand that part of the
code. Still, I understand enough to look at the code in more detail
later.

Thank you,
Steven Stewart-Gallus

P.S.

I'm CC'ing Michael because he wanted to know this case so could
document it.

----- Original Message -----
From: Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>
Date: Saturday, May 3, 2014 10:45 am
Subject: Re: execve is not atomic, what is the exit state of the process when
execve fails after throwing away the original process image?
To: Steven Stewart-Gallus <sstewartgallus00-QKvm5KDIoDa7M0a00MdBSQ@public.gmane.org>
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> On Fri, May 02, 2014 at 02:19:52AM +0000, Steven Stewart-Gallus wrote:
> > execve is not atomic, what is the exit state of the process when
> > execve fails after throwing away the original process image?
> 
> See http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 or
> so – as far as I know, the kernel sends a SIGKILL. Does that help?
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image?
       [not found]         ` <faa0c9e9297ad.53656b4c-BTv7Ps/Sm75C8prJL3GQQw@public.gmane.org>
@ 2014-05-04 19:27           ` Michael Kerrisk (man-pages)
       [not found]             ` <53669485.3020007-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-04 19:27 UTC (permalink / raw)
  To: Steven Stewart-Gallus, Jann Horn
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Rich Felker,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[CC+=Rich Felker, because the discussion started with a reference to
http://ewontfix.com/14/ ]

On 05/04/2014 12:18 AM, Steven Stewart-Gallus wrote:
> 
> ----- Original Message -----
> From: Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>
> Date: Saturday, May 3, 2014 10:45 am
> Subject: Re: execve is not atomic, what is the exit state of the process when
> execve fails after throwing away the original process image?
> To: Steven Stewart-Gallus <sstewartgallus00-QKvm5KDIoDa7M0a00MdBSQ@public.gmane.org>
> Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> 
>> On Fri, May 02, 2014 at 02:19:52AM +0000, Steven Stewart-Gallus wrote:
>>> execve is not atomic, what is the exit state of the process when
>>> execve fails after throwing away the original process image?
>>
>> See http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 or
>> so – as far as I know, the kernel sends a SIGKILL. Does that help?
>
> Thank you Jann
> Horn. http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740
> answers my question.
> 
> On reflection, the kernel code makes sense. The process must either
> exit with an error code or raise the SIGKILL signal because SIGKILL
> and SIGSTOP are the only unblockable signals (of course, the kernel
> has the privileges to do whatever it wants but it tries to be
> consistent with userspace).
> 
> Strangely, in other places the SIGSEGV is sent when the ELF file is
> incorrect in some places and I don't fully understand that part of the
> code. Still, I understand enough to look at the code in more detail
> later.
> 
> Thank you,
> Steven Stewart-Gallus
> 
> P.S.
> 
> I'm CC'ing Michael because he wanted to know this case so could
> document it.

Fair enough. I plan to add the following text to the execve(2) man
page:

       In most cases where execve()  fails,  control  returns  to  the
       original  executable image, and the caller of execve() can then
       handle the error.  However, in (rare) cases  (typically  caused
       by resource exhaustion), failure may occur past the point of no
       return: the original exectable image has been  torn  down,  but
       the  new  image  could not be completely built.  In such cases,
       the kernel kills the process with a SIGKILL signal.

Comments?

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image?
       [not found]             ` <53669485.3020007-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-05-04 20:15               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-04 20:15 UTC (permalink / raw)
  To: Steven Stewart-Gallus, Jann Horn
  Cc: Michael Kerrisk, Linux API, Rich Felker,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Attachment #1: Type: text/plain, Size: 3456 bytes --]

On Sun, May 4, 2014 at 9:27 PM, Michael Kerrisk (man-pages)
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> [CC+=Rich Felker, because the discussion started with a reference to
> http://ewontfix.com/14/ ]
>
> On 05/04/2014 12:18 AM, Steven Stewart-Gallus wrote:
>>
>> ----- Original Message -----
>> From: Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>
>> Date: Saturday, May 3, 2014 10:45 am
>> Subject: Re: execve is not atomic, what is the exit state of the process when
>> execve fails after throwing away the original process image?
>> To: Steven Stewart-Gallus <sstewartgallus00-QKvm5KDIoDa7M0a00MdBSQ@public.gmane.org>
>> Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>
>>> On Fri, May 02, 2014 at 02:19:52AM +0000, Steven Stewart-Gallus wrote:
>>>> execve is not atomic, what is the exit state of the process when
>>>> execve fails after throwing away the original process image?
>>>
>>> See http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 or
>>> so – as far as I know, the kernel sends a SIGKILL. Does that help?
>>
>> Thank you Jann
>> Horn. http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740
>> answers my question.
>>
>> On reflection, the kernel code makes sense. The process must either
>> exit with an error code or raise the SIGKILL signal because SIGKILL
>> and SIGSTOP are the only unblockable signals (of course, the kernel
>> has the privileges to do whatever it wants but it tries to be
>> consistent with userspace).
>>
>> Strangely, in other places the SIGSEGV is sent when the ELF file is
>> incorrect in some places and I don't fully understand that part of the
>> code. Still, I understand enough to look at the code in more detail
>> later.
>>
>> Thank you,
>> Steven Stewart-Gallus
>>
>> P.S.
>>
>> I'm CC'ing Michael because he wanted to know this case so could
>> document it.
>
> Fair enough. I plan to add the following text to the execve(2) man
> page:
>
>        In most cases where execve()  fails,  control  returns  to  the
>        original  executable image, and the caller of execve() can then
>        handle the error.  However, in (rare) cases  (typically  caused
>        by resource exhaustion), failure may occur past the point of no
>        return: the original exectable image has been  torn  down,  but
>        the  new  image  could not be completely built.  In such cases,
>        the kernel kills the process with a SIGKILL signal.
>
> Comments?

It turns out to be not too hard to trigger this case. See, for
example, the attached pair of programs, and the shell log below.

Cheers,

Michael

# Beware: if you try the below, the OOM killer may kill something random
# (Okay, not random: probably it'll be that hog firefox ;-).)

# Disable memory overvcommit (see proc(5))
$ sudo sh -c "echo 2 > /proc/sys/vm/overcommit_memory"

$ ./multi_fork_exec ./large_image
cnt = 0
cnt = 1
cnt = 2
cnt = 3
[...]
cnt = 213
cnt = 214
cnt = 215
    Child PID=26070
    Status: child killed by signal 9 (Killed)
    Child PID=26062
    Status: child killed by signal 9 (Killed)
    Child PID=26053
    Status: child killed by signal 9 (Killed)
    Child PID=25900
    Status: child killed by signal 9 (Killed)
[...]

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

[-- Attachment #2: multi_fork_exec.c --]
[-- Type: text/x-csrc, Size: 3191 bytes --]

/*#* multi_fork_exec.c 
 
   Use with large_image.c to trigger this execve() case:


      In most cases where execve()  fails,  control  returns  to  the
      original  executable image, and the caller of execve() can then
      handle the error.  However, in (rare) cases  (typically  caused
      by resource exhaustion), failure may occur past the point of no
      return: the original executable image has been torn  down,  but
      the  new  image  could not be completely built.  In such cases,
      the kernel kills the process with a SIGKILL signal.
*/
/*#**
   Change history

   04 May 14	Initial creation
*/
#define _GNU_SOURCE  
#include <sys/wait.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

#define errExit(msg) 	do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)


static void 	/* Examine a wait() status using the W* macros */
printWaitStatus(const char *msg, int status)
{
    if (msg != NULL)
        printf("%s", msg);

    if (WIFEXITED(status)) {
        printf("child exited, status=%d\n", WEXITSTATUS(status));

    } else if (WIFSIGNALED(status)) {
        printf("child killed by signal %d (%s)",
                WTERMSIG(status), strsignal(WTERMSIG(status)));
#ifdef WCOREDUMP    	/* Not in SUSv3, may be absent on some systems */
        if (WCOREDUMP(status))
            printf(" (core dumped)");
#endif
        printf("\n");

    } else if (WIFSTOPPED(status)) {
        printf("child stopped by signal %d (%s)\n",
                WSTOPSIG(status), strsignal(WSTOPSIG(status)));

#ifdef WIFCONTINUED 	/* SUSv3 has this, but older Linux versions and
                           some other UNIX implementations don't */
    } else if (WIFCONTINUED(status)) {
        printf("child continued\n");
#endif

    } else {		/* Should never happen */
        printf("what happened to this child? (status=%x)\n",
                (unsigned int) status);
    }
}

static void             /* Handler for child termination signal */
grimReaper(int sig)
{   
    int status;                 /* Child status from waitpid() */
    pid_t pid;
    int savedErrno;

    savedErrno = errno;

    while ((pid = waitpid(-1, &status, 0)) > 0) {
        if (pid == -1)
                errExit("waitpid");
        printf("\tChild PID=%ld\n", (long) pid);
        printWaitStatus("\tStatus: ", status);
    }
    errno = savedErrno;
}

int
main(int argc, char *argv[])
{
    int cnt;
    pid_t cpid;
    struct sigaction sa;

    /* Set up handler to reap dead children */

    sa.sa_flags = 0;
    sa.sa_handler = grimReaper;
    sigemptyset(&sa.sa_mask);
    if (sigaction(SIGCHLD, &sa, NULL) == -1)
        errExit("sigaction");

    /* Create multiple children, each of which execs the program named in
       argv[1] */

    for (cnt = 0; ; cnt++) {
        printf("cnt = %d\n", cnt);

        cpid = fork();
        if (cpid == -1)
            errExit("fork");

        if (cpid == 0) {	/* Child */
            execv(argv[1], &argv[1]);
            errExit("execv");
        }

        /* Parent continues round loop */
    }

    exit(EXIT_SUCCESS);
}

[-- Attachment #3: large_image.c --]
[-- Type: text/x-csrc, Size: 292 bytes --]

/*#* large_image.c 
*/
/*#**
   Change history

   04 May 14	Initial creation
*/
#include <unistd.h>
#include <stdlib.h>

/* Make this image large, to chew up a good bit of RAM/swap */

char buf[100 * 1000 * 1000];

int
main(int argc, char *argv[])
{
    sleep(30);
    exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-04 20:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-02  2:19 execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image? Steven Stewart-Gallus
     [not found] ` <faa2c0e82f8a8.536300c8-BTv7Ps/Sm75C8prJL3GQQw@public.gmane.org>
2014-05-03 17:45   ` Jann Horn
     [not found]     ` <20140503174510.GA7720-7cfQGs147y6a6lf8Wg2v7Z5kstrrjoWp@public.gmane.org>
2014-05-03 22:18       ` Steven Stewart-Gallus
     [not found]         ` <faa0c9e9297ad.53656b4c-BTv7Ps/Sm75C8prJL3GQQw@public.gmane.org>
2014-05-04 19:27           ` Michael Kerrisk (man-pages)
     [not found]             ` <53669485.3020007-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-04 20:15               ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).