Linux Container Development
 help / color / mirror / Atom feed
  • [parent not found: <20130228142440.GA6328@cachalot>]
  • [parent not found: <87txowa2cm.fsf@xmission.com>]
  • * For review: pid_namespaces(7) man page
    @ 2014-08-20 23:38 Michael Kerrisk (man-pages)
      0 siblings, 0 replies; 31+ messages in thread
    From: Michael Kerrisk (man-pages) @ 2014-08-20 23:38 UTC (permalink / raw)
      To: Eric W. Biederman
      Cc: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
    	richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w,
    	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, lkml,
    	Andy Lutomirski, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
    
    [-- Attachment #1: Type: text/plain, Size: 11429 bytes --]
    
    Hello Eric et al.
    
    Here is the current draft of the pid_namespaces(7) man page, which
    described PID namespaces. The rendered version is below, and the
    source is attached.
    
    Review comments/suggestions for improvements / bug fixes welcome.
    
    Cheers,
    
    Michael
    
    ==
    
    NAME
           pid_namespaces - overview of Linux PID namespaces
    
    DESCRIPTION
           For an overview of namespaces, see namespaces(7).
    
           PID  namespaces isolate the process ID number space, meaning that
           processes in different PID namespaces can have the same PID.  PID
           namespaces allow containers to provide functionality such as sus‐
           pending/resuming the  set  of  processes  in  the  container  and
           migrating  the container to a new host while the processes inside
           the container maintain the same PIDs.
    
           PIDs in a new PID namespace start at 1, somewhat  like  a  stand‐
           alone  system,  and  calls to fork(2), vfork(2), or clone(2) will
           produce processes with PIDs that are unique within the namespace.
    
           Use of PID namespaces requires a kernel that is  configured  with
           the CONFIG_PID_NS option.
    
       The namespace init process
           The  first  process created in a new namespace (i.e., the process
           created using clone(2) with the CLONE_NEWPID flag, or  the  first
           child  created  by a process after a call to unshare(2) using the
           CLONE_NEWPID flag) has the PID 1, and is the "init"  process  for
           the  namespace  (see  init(1)).  A child process that is orphaned
           within the namespace will be reparented to  this  process  rather
           than init(1) (unless one of the ancestors of the child
            in    the    same    PID   namespace   employed   the   prctl(2)
           PR_GET_CHILD_SUBREAPER command to mark itself as  the  reaper  of
           orphaned descendant processes).
    
           If  the  "init" process of a PID namespace terminates, the kernel
           terminates all of the processes in the namespace  via  a  SIGKILL
           signal.   This behavior reflects the fact that the "init" process
           is essential for the correct operation of a  PID  namespace.   In
           this case, a subsequent fork(2) into this PID namespace will fail
           with the error ENOMEM; it is not possible to create  a  new  pro‐
           cesses  in  a  PID namespace whose "init" process has terminated.
           Such scenarios can occur when, for example,  a  process  uses  an
           open  file descriptor for a /proc/[pid]/ns/pid file corresponding
           to a process that was in a namespace to setns(2) into that names‐
           pace  after  the "init" process has terminated.  Another possible
           scenario can occur after a call to unshare(2): if the first child
           subsequently  created  by  a  fork(2) terminates, then subsequent
           calls to fork(2) will fail with ENOMEM.
    
           Only signals for which the "init" process has established a  sig‐
           nal handler can be sent to the "init" process by other members of
           the PID namespace.  This restriction applies even  to  privileged
           processes,  and  prevents other members of the PID namespace from
           accidentally killing the "init" process.
    
           Likewise, a process in an ancestor namespace can—subject  to  the
           usual  permission checks described in kill(2)—send signals to the
           "init" process of a  child  PID  namespace  only  if  the  "init"
           process  has  established a handler for that signal.  (Within the
           handler, the siginfo_t si_pid  field  described  in  sigaction(2)
           will  be  zero.)   SIGKILL  or SIGSTOP are treated exceptionally:
           these signals are forcibly delivered when sent from  an  ancestor
           PID  namespace.   Neither  of  these signals can be caught by the
           "init" process, and so will result in the usual  actions  associ‐
           ated  with  those signals (respectively, terminating and stopping
           the process).
    
           Starting with Linux 3.4, the reboot(2) system causes a signal  to
           be  sent to the namespace "init" process.  See reboot(2) for more
           details.
    
       Nesting PID namespaces
           PID namespaces can be nested: each PID namespace  has  a  parent,
           except  for  the initial ("root") PID namespace.  The parent of a
           PID namespace is the PID namespace of the  process  that  created
           the  namespace using clone(2) or unshare(2).  PID namespaces thus
           form a tree, with all namespaces ultimately tracing their  ances‐
           try to the root namespace.
    
           A process is visible to other processes in its PID namespace, and
           to the processes in each direct ancestor PID namespace going back
           to the root PID namespace.  In this context, "visible" means that
           one process can be the target of operations  by  another  process
           using  system  calls  that specify a process ID.  Conversely, the
           processes in a child PID namespace can't  see  processes  in  the
           parent  and further removed ancestor namespace.  More succinctly:
           a process can see (e.g., send signals with kill(2), set nice val‐
           ues  with  setpriority(2),  etc.) only processes contained in its
           own PID namespace and in descendants of that namespace.
    
           A process has one process ID in each of the  layers  of  the  PID
           namespace  hierarchy in which is visible, and walking back though
           each direct ancestor namespace through to the root PID namespace.
           System calls that operate on process IDs always operate using the
           process ID that is visible in the PID namespace of the caller.  A
           call  to  getpid(2)  always  returns  the PID associated with the
           namespace in which the process was created.
    
           Some processes in a PID namespace may have parents that are  out‐
           side  of  the  namespace.  For example, the parent of the initial
           process in the namespace (i.e., the init(1) process with  PID  1)
           is  necessarily in another namespace.  Likewise, the direct chil‐
           dren of a process that uses setns(2) to  cause  its  children  to
           join  a  PID  namespace are in a different PID namespace from the
           caller of setns(2).   Calls  to  getppid(2)  for  such  processes
           return 0.
    
       setns(2) and unshare(2) semantics
           Calls  to  setns(2)  that specify a PID namespace file descriptor
           and calls to unshare(2) with the CLONE_NEWPID flag cause children
           subsequently  created  by  the caller to be placed in a different
           PID namespace from the caller.   These  calls  do  not,  however,
           change the PID namespace of the calling process, because doing so
           would change the caller's idea of its own  PID  (as  reported  by
           getpid()), which would break many applications and libraries.
    
           To  put  things another way: a process's PID namespace membership
           is determined when the process is created and cannot  be  changed
           thereafter.   Among  other  things,  this means that the parental
           relationship between processes mirrors the parental  relationship
           between  PID namespaces: the parent of a process is either in the
           same namespace or resides in the immediate parent PID namespace.
    
       Compatibility of CLONE_NEWPID with other CLONE_* flags
           CLONE_NEWPID can't be combined with some other CLONE_* flags:
    
           *  CLONE_THREAD requires being in the same PID namespace in order
              that  that  the  threads in a process can send signals to each
              other.  Similarly, it must be  possible  to  see  all  of  the
              threads of a processes in the proc(5) filesystem.
    
           *  CLONE_SIGHAND requires being in the same PID namespace; other‐
              wise the process ID of the process sending a signal could  not
              be  meaningfully  encoded  when  a  signal  is  sent  (see the
              description of the siginfo_t type in sigaction(2)).  A  signal
              queue  shared  by  processes  in  multiple PID namespaces will
              defeat that.
    
           *  CLONE_VM requires all of the threads to be  in  the  same  PID
              namespace,  because, from the point of view of a core dump, if
              two processes share the same address space  they  are  threads
              and  will  be core dumped together.  When a core dump is writ‐
              ten, the PID of each thread is written  into  the  core  dump.
              Writing the process IDs could not meaningfully succeed if some
              of the process IDs were in a parent PID namespace.
    
           To summarize: there  is  a  technical  requirement  for  each  of
           CLONE_THREAD,  CLONE_SIGHAND,  and CLONE_VM to share a PID names‐
           pace.  (Note furthermore that in clone(2) requires CLONE_VM to be
           specified  if CLONE_THREAD or CLONE_SIGHAND is specified.)  Thus,
           call sequences such as the following will fail  (with  the  error
           EINVAL):
    
               unshare(CLONE_NEWPID);
               clone(..., CLONE_VM, ...);    /* Fails */
    
               setns(fd, CLONE_NEWPID);
               clone(..., CLONE_VM, ...);    /* Fails */
    
               clone(..., CLONE_VM, ...);
               setns(fd, CLONE_NEWPID);      /* Fails */
    
               clone(..., CLONE_VM, ...);
               unshare(CLONE_NEWPID);        /* Fails */
    
       /proc and PID namespaces
           A /proc filesystem shows (in the /proc/PID directories) only pro‐
           cesses visible in the PID namespace of the process that performed
           the  mount, even if the /proc filesystem is viewed from processes
           in other namespaces.
    
           After creating a new PID namespace, it is useful for the child to
           change  its  root  directory  and  mount a new procfs instance at
           /proc so that tools such as ps(1) work correctly.  If a new mount
           namespace  is  simultaneously created by including CLONE_NEWNS in
           the flags argument of clone(2) or unshare(2), then it isn't  nec‐
           essary to change the root directory: a new procfs instance can be
           mounted directly over /proc.
    
           From a shell, the command to mount /proc is:
    
               $ mount -t proc proc /proc
    
           Calling readlink(2) on the path /proc/self yields the process  ID
           of the caller in the PID namespace of the procfs mount (i.e., the
           PID namespace of the process that mounted the procfs).  This  can
           be  useful  for  introspection  purposes, when a process wants to
           discover its PID in other namespaces.
    
       Miscellaneous
           When a process ID is passed  over  a  UNIX  domain  socket  to  a
           process  in  a  different  PID  namespace (see the description of
           SCM_CREDENTIALS in unix(7)), it is  translated  into  the  corre‐
           sponding PID value in the receiving process's PID namespace.
    
    CONFORMING TO
           Namespaces are a Linux-specific feature.
    
    EXAMPLE
           See user_namespaces(7).
    
    SEE ALSO
           clone(2), setns(2), unshare(2), proc(5), credentials(7), capabil‐
           ities(7), user_namespaces(7), switch_root(8)
    
    
    
    -- 
    Michael Kerrisk
    Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
    Linux/UNIX System Programming Training: http://man7.org/training/
    
    [-- Attachment #2: pid_namespaces.7 --]
    [-- Type: application/x-troff-man, Size: 11363 bytes --]
    
    [-- Attachment #3: Type: text/plain, Size: 205 bytes --]
    
    _______________________________________________
    Containers mailing list
    Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
    https://lists.linuxfoundation.org/mailman/listinfo/containers
    
    ^ permalink raw reply	[flat|nested] 31+ messages in thread
    * For review: pid_namespaces(7) man page
    @ 2013-02-28 11:24 Michael Kerrisk (man-pages)
      0 siblings, 0 replies; 31+ messages in thread
    From: Michael Kerrisk (man-pages) @ 2013-02-28 11:24 UTC (permalink / raw)
      To: Eric W. Biederman; +Cc: linux-man, Linux Containers, lkml
    
    [-- Attachment #1: Type: text/plain, Size: 18548 bytes --]
    
    Eric et al,
    
    Eventually, there will be more namespace man pages, but let us start
    now with one for PID namespaces. The attached page aims to provide a
    fairly complete overview of PID namespaces.
    
    Eric, various pieces of the page are shifted out of other pages
    (clone(2), setns(2), etc.) and are derived from comments you've
    emailed me off list, so you are (jointly) in the copyright of the
    page. I've chosen the common license for man-pages; let me know if you
    have any objections to that license.
    
    I'm looking for review comments (corrections, improvements, additions,
    etc.) on this page. I've provided it in two forms inline below, and
    reviewers can comment comment on whichever form they are most
    comfortable with:
    
    1) The rendered page as plain text
    2) The *roff source (also attached); rendering that source will enable
    readers to see proper formatting for the page.
    
    Note that the namespaces(7) page referred to in this page is not yet
    finished; I'll send it out for review at a future time.
    
    Thanks,
    
    Michael
    
    ==========
    PID_NAMESPACES(7)      Linux Programmer's Manual     PID_NAMESPACES(7)
    
    NAME
           pid_namespaces - overview of Linux PID namespaces
    
    DESCRIPTION
           For an overview of namespaces, see namespaces(7).
    
           PID  namespaces  isolate  the  process ID number space, meaning
           that processes in different PID namespaces can  have  the  same
           PID.   PID namespaces allow containers to migrate to a new host
           while the processes inside  the  container  maintain  the  same
           PIDs.
    
           PIDs  in a new PID namespace start at 1, somewhat like a stand‐
           alone system, and calls to fork(2), vfork(2), or clone(2)  will
           produce  processes  with PIDs that are unique within the names‐
           pace.
    
           Use of PID namespaces requires a kernel that is configured with
           the CONFIG_PID_NS option.
    
       The namespace init process
           The first process created in a new namespace (i.e., the process
           created using clone(2) with the CLONE_NEWPID flag, or the first
           child created by a process after a call to unshare(2) using the
           CLONE_NEWPID flag) has the PID 1, and is the "init" process for
           the namespace (see init(1)).  Children that are orphaned within
           the namespace will be reparented to this  process  rather  than
           init(1).
    
           If the "init" process of a PID namespace terminates, the kernel
           terminates all of the processes in the namespace via a  SIGKILL
           signal.   This  behavior  reflects  the  fact  that  the "init"
           process is essential for the correct operation of a PID  names‐
           pace.   In this case, a subsequent fork(2) into this PID names‐
           pace (e.g., from a process that has done a  setns(2)  into  the
           namespace    using    an    open    file   descriptor   for   a
           /proc/[pid]/ns/pid file corresponding to a process that was  in
           the  namespace) will fail with the error ENOMEM; it is not pos‐
           sible to create a new processes in a PID namespace whose "init"
           process has terminated.
    
           Only  signals  for  which  the "init" process has established a
           signal handler can be sent to the "init" process by other  mem‐
           bers  of  the  PID namespace.  This restriction applies even to
           privileged processes, and prevents other  members  of  the  PID
           namespace from accidentally killing the "init" process.
    
           Likewise, a process in an ancestor namespace can—subject to the
           usual permission checks described in  kill(2)—send  signals  to
           the  "init" process of a child PID namespace only if the "init"
           process has established a handler for that signal.  (Within the
           handler,  the  siginfo_t si_pid field described in sigaction(2)
           will be zero.)  SIGKILL or SIGSTOP are  treated  exceptionally:
           these signals are forcibly delivered when sent from an ancestor
           PID namespace.  Neither of these signals can be caught  by  the
           "init" process, and so will result in the usual actions associ‐
           ated with those signals (respectively, terminating and stopping
           the process).
    
       Nesting PID namespaces
           PID  namespaces can be nested: each PID namespace has a parent,
           except for the initial ("root") PID namespace.  The parent of a
           PID  namespace is the PID namespace of the process that created
           the namespace using clone(2)  or  unshare(2).   PID  namespaces
           thus  form a tree, with all namespaces ultimately tracing their
           ancestry to the root namespace.
    
           A process is visible to other processes in its  PID  namespace,
           and  to  the  processes  in  each direct ancestor PID namespace
           going back to the root PID namespace.  In this context,  "visi‐
           ble"  means that one process can be the target of operations by
           another process using system calls that specify a  process  ID.
           Conversely,  the  processes  in a child PID namespace can't see
           processes in the parent and further removed ancestor namespace.
           More  succinctly:  a  process  can see (e.g., send signals with
           kill(2), set nice values with setpriority(2), etc.)  only  pro‐
           cesses contained in its own PID namespace and in descendants of
           that namespace.
    
           A process has one process ID in each of the layers of  the  PID
           namespace  hierarchy  in  which  is  visible,  and walking back
           though each direct ancestor namespace through to the  root  PID
           namespace.   System  calls  that  operate on process IDs always
           operate using the process ID that is visible in the PID  names‐
           pace of the caller.  A call to getpid(2) always returns the PID
           associated with the namespace in which the process was created.
    
           Some processes in a PID namespace may  have  parents  that  are
           outside  of the namespace.  For example, the parent of the ini‐
           tial process in the namespace (i.e., the init(1)  process  with
           PID  1)  is  necessarily  in  another namespace.  Likewise, the
           direct children of a process that uses setns(2)  to  cause  its
           children  to join a PID namespace are in a different PID names‐
           pace from the caller of setns(2).  Calls to getppid(2) for such
           processes return 0.
    
       setns(2) and unshare(2) semantics
           Calls  to setns(2) that specify a PID namespace file descriptor
           and calls to unshare(2) with the CLONE_NEWPID flag cause  chil‐
           dren  subsequently created by the caller to be placed in a dif‐
           ferent PID namespace from the caller.  These calls do not, how‐
           ever,  change the PID namespace of the calling process, because
           doing so would change the caller's idea  of  its  own  PID  (as
           reported  by getpid()), which would break many applications and
           libraries.
    
           To put things another way: a process's PID namespace membership
           is determined when the process is created and cannot be changed
           thereafter.  Among other things, this means that  the  parental
           relationship between processes mirrors the parental between PID
           namespaces: the parent of a  process  is  either  in  the  same
           namespace or resides in the immediate parent PID namespace.
    
           Every  thread  in  a process must be in the same PID namespace.
           For this reason, the two following call sequences will fail:
    
               unshare(CLONE_NEWPID);
               clone(..., CLONE_VM, ...);    /* Fails */
    
               setns(fd, CLONE_NEWPID);
               clone(..., CLONE_VM, ...);    /* Fails */
    
           Because the above unshare(2) and setns(2) calls only change the
           PID  namespace  for created children, the clone(2) calls neces‐
           sarily put the new thread in a different PID namespace from the
           calling thread.
    
       Miscellaneous
           After  creating a new PID namespace, it is useful for the child
           to change its root directory and mount a new procfs instance at
           /proc  so  that  tools such as ps(1) work correctly.  (If a new
           mount  namespace  is  simultaneously   created   by   including
           CLONE_NEWNS  in  the flags argument of clone(2) or unshare(2)),
           then it isn't necessary to change the  root  directory:  a  new
           procfs instance can be mounted directly over /proc.)
    
           Calling  readlink(2)  on the path /proc/self yields the process
           ID of the caller in the  PID  namespace  of  the  procfs  mount
           (i.e.,  the  PID  namespace  of  the  process  that mounted the
           procfs).
    
           When a process ID is passed over a  UNIX  domain  socket  to  a
           process  in  a  different PID namespace (see the description of
           SCM_CREDENTIALS in unix(7)), it is translated into  the  corre‐
           sponding PID value in the receiving process's PID namespace.
    
    CONFORMING TO
           Namespaces are a Linux-specific feature.
    
    SEE ALSO
           unshare(1),  clone(2),  setns(2),  unshare(2), proc(5), creden‐
           tials(7), capabilities(7), user_namespaces(7), switch_root(8)
    
    
    
    Linux                         2013-01-14             PID_NAMESPACES(7)
    
    
    =========== *roff source ==========
    
    $ cat pid_namespaces.7
    .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
    .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    .\"
    .\" Permission is granted to make and distribute verbatim copies of this
    .\" manual provided the copyright notice and this permission notice are
    .\" preserved on all copies.
    .\"
    .\" Permission is granted to copy and distribute modified versions of this
    .\" manual under the conditions for verbatim copying, provided that the
    .\" entire resulting derived work is distributed under the terms of a
    .\" permission notice identical to this one.
    .\"
    .\" Since the Linux kernel and libraries are constantly changing, this
    .\" manual page may be incorrect or out-of-date.  The author(s) assume no
    .\" responsibility for errors or omissions, or for damages resulting from
    .\" the use of the information contained herein.  The author(s) may not
    .\" have taken the same level of care in the production of this manual,
    .\" which is licensed free of charge, as they might when working
    .\" professionally.
    .\"
    .\" Formatted or processed versions of this manual, if unaccompanied by
    .\" the source, must acknowledge the copyright and authors of this work.
    .\"
    .\"
    .TH PID_NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
    .SH NAME
    pid_namespaces \- overview of Linux PID namespaces
    .SH DESCRIPTION
    For an overview of namespaces, see
    .BR namespaces (7).
    
    PID namespaces isolate the process ID number space,
    meaning that processes in different PID namespaces can have the same PID.
    PID namespaces allow containers to migrate to a new host
    while the processes inside the container maintain the same PIDs.
    
    PIDs in a new PID namespace start at 1,
    somewhat like a standalone system, and calls to
    .BR fork (2),
    .BR vfork (2),
    or
    .BR clone (2)
    will produce processes with PIDs that are unique within the namespace.
    
    Use of PID namespaces requires a kernel that is configured with the
    .B CONFIG_PID_NS
    option.
    .\"
    .\" ============================================================
    .\"
    .SS The namespace "init" process
    The first process created in a new namespace
    (i.e., the process created using
    .BR clone (2)
    with the
    .BR CLONE_NEWPID
    flag, or the first child created by a process after a call to
    .BR unshare (2)
    using the
    .BR CLONE_NEWPID
    flag) has the PID 1, and is the "init" process for the namespace (see
    .BR init (1)).
    Children that are orphaned within the namespace will be reparented
    to this process rather than
    .BR init (1).
    
    If the "init" process of a PID namespace terminates,
    the kernel terminates all of the processes in the namespace via a
    .BR SIGKILL
    signal.
    This behavior reflects the fact that the "init" process
    is essential for the correct operation of a PID namespace.
    In this case, a subsequent
    .BR fork (2)
    into this PID namespace (e.g., from a process that has done a
    .BR setns (2)
    into the namespace using an open file descriptor for a
    .I /proc/[pid]/ns/pid
    file corresponding to a process that was in the namespace)
    will fail with the error
    .BR ENOMEM ;
    it is not possible to create a new processes in a PID namespace whose "init"
    process has terminated.
    
    Only signals for which the "init" process has established a signal handler
    can be sent to the "init" process by other members of the PID namespace.
    This restriction applies even to privileged processes,
    and prevents other members of the PID namespace from
    accidentally killing the "init" process.
    
    Likewise, a process in an ancestor namespace
    can\(emsubject to the usual permission checks described in
    .BR kill (2)\(emsend
    signals to the "init" process of a child PID namespace only
    if the "init" process has established a handler for that signal.
    (Within the handler, the
    .I siginfo_t
    .I si_pid
    field described in
    .BR sigaction (2)
    will be zero.)
    .B SIGKILL
    or
    .B SIGSTOP
    are treated exceptionally:
    these signals are forcibly delivered when sent from an ancestor PID namespace.
    Neither of these signals can be caught by the "init" process,
    and so will result in the usual actions associated with those signals
    (respectively, terminating and stopping the process).
    .\"
    .\" ============================================================
    .\"
    .SS Nesting PID namespaces
    PID namespaces can be nested:
    each PID namespace has a parent,
    except for the initial ("root") PID namespace.
    The parent of a PID namespace is the PID namespace of the process that
    created the namespace using
    .BR clone (2)
    or
    .BR unshare (2).
    PID namespaces thus form a tree,
    with all namespaces ultimately tracing their ancestry to the root namespace.
    
    A process is visible to other processes in its PID namespace,
    and to the processes in each direct ancestor PID namespace
    going back to the root PID namespace.
    In this context, "visible" means that one process
    can be the target of operations by another process using
    system calls that specify a process ID.
    Conversely, the processes in a child PID namespace can't see
    processes in the parent and further removed ancestor namespace.
    More succinctly: a process can see (e.g., send signals with
    .BR kill(2),
    set nice values with
    .BR setpriority (2),
    etc.) only processes contained in its own PID namespace
    and in descendants of that namespace.
    
    A process has one process ID in each of the layers of the PID
    namespace hierarchy in which is visible,
    and walking back though each direct ancestor namespace
    through to the root PID namespace.
    System calls that operate on process IDs always
    operate using the process ID that is visible in the
    PID namespace of the caller.
    A call to
    .BR getpid (2)
    always returns the PID associated with the namespace in which
    the process was created.
    
    Some processes in a PID namespace may have parents
    that are outside of the namespace.
    For example, the parent of the initial process in the namespace
    (i.e., the
    .BR init (1)
    process with PID 1) is necessarily in another namespace.
    Likewise, the direct children of a process that uses
    .BR setns (2)
    to cause its children to join a PID namespace are in a different
    PID namespace from the caller of
    .BR setns (2).
    Calls to
    .BR getppid (2)
    for such processes return 0.
    .\"
    .\" ============================================================
    .\"
    .SS setns(2) and unshare(2) semantics
    Calls to
    .BR setns (2)
    that specify a PID namespace file descriptor
    and calls to
    .BR unshare (2)
    with the
    .BR CLONE_NEWPID
    flag cause children subsequently created
    by the caller to be placed in a different PID namespace from the caller.
    These calls do not, however,
    change the PID namespace of the calling process,
    because doing so would change the caller's idea of its own PID
    (as reported by
    .BR getpid ()),
    which would break many applications and libraries.
    
    To put things another way:
    a process's PID namespace membership is determined when the process is created
    and cannot be changed thereafter.
    Among other things, this means that the parental relationship
    between processes mirrors the parental between PID namespaces:
    the parent of a process is either in the same namespace
    or resides in the immediate parent PID namespace.
    
    Every thread in a process must be in the same PID namespace.
    For this reason, the two following call sequences will fail:
    
    .nf
        unshare(CLONE_NEWPID);
        clone(..., CLONE_VM, ...);    /* Fails */
    
        setns(fd, CLONE_NEWPID);
        clone(..., CLONE_VM, ...);    /* Fails */
    .fi
    
    Because the above
    .BR unshare (2)
    and
    .BR setns (2)
    calls only change the PID namespace for created children, the
    .BR clone (2)
    calls necessarily put the new thread in a different PID namespace from
    the calling thread.
    .\"
    .\" ============================================================
    .\"
    .SS Miscellaneous
    After creating a new PID namespace,
    it is useful for the child to change its root directory
    and mount a new procfs instance at
    .I /proc
    so that tools such as
    .BR ps (1)
    work correctly.
    .\" mount -t proc proc /proc
    (If a new mount namespace is simultaneously created by including
    .BR CLONE_NEWNS
    in the
    .IR flags
    argument of
    .BR clone (2)
    or
    .BR unshare (2)),
    then it isn't necessary to change the root directory:
    a new procfs instance can be mounted directly over
    .IR /proc .)
    
    Calling
    .BR readlink (2)
    on the path
    .I /proc/self
    yields the process ID of the caller in the PID namespace of the procfs mount
    (i.e., the PID namespace of the process that mounted the procfs).
    
    When a process ID is passed over a UNIX domain socket to a
    process in a different PID namespace (see the description of
    .B SCM_CREDENTIALS
    in
    .BR unix (7)),
    it is translated into the corresponding PID value in
    the receiving process's PID namespace.
    .SH CONFORMING TO
    Namespaces are a Linux-specific feature.
    .SH SEE ALSO
    .BR unshare (1),
    .BR clone (2),
    .BR setns (2),
    .BR unshare (2),
    .BR proc (5),
    .BR credentials (7),
    .BR capabilities (7),
    .BR user_namespaces (7),
    .BR switch_root (8)
    
    [-- Attachment #2: pid_namespaces.7 --]
    [-- Type: application/octet-stream, Size: 8766 bytes --]
    
    .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
    .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
    .\"
    .\" Permission is granted to make and distribute verbatim copies of this
    .\" manual provided the copyright notice and this permission notice are
    .\" preserved on all copies.
    .\"
    .\" Permission is granted to copy and distribute modified versions of this
    .\" manual under the conditions for verbatim copying, provided that the
    .\" entire resulting derived work is distributed under the terms of a
    .\" permission notice identical to this one.
    .\"
    .\" Since the Linux kernel and libraries are constantly changing, this
    .\" manual page may be incorrect or out-of-date.  The author(s) assume no
    .\" responsibility for errors or omissions, or for damages resulting from
    .\" the use of the information contained herein.  The author(s) may not
    .\" have taken the same level of care in the production of this manual,
    .\" which is licensed free of charge, as they might when working
    .\" professionally.
    .\"
    .\" Formatted or processed versions of this manual, if unaccompanied by
    .\" the source, must acknowledge the copyright and authors of this work.
    .\"
    .\"
    .TH PID_NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
    .SH NAME
    pid_namespaces \- overview of Linux PID namespaces
    .SH DESCRIPTION
    For an overview of namespaces, see
    .BR namespaces (7).
    
    PID namespaces isolate the process ID number space,
    meaning that processes in different PID namespaces can have the same PID.
    PID namespaces allow containers to migrate to a new host
    while the processes inside the container maintain the same PIDs.
    
    PIDs in a new PID namespace start at 1,
    somewhat like a standalone system, and calls to
    .BR fork (2),
    .BR vfork (2),
    or
    .BR clone (2)
    will produce processes with PIDs that are unique within the namespace.
    
    Use of PID namespaces requires a kernel that is configured with the
    .B CONFIG_PID_NS
    option.
    .\"
    .\" ============================================================
    .\"
    .SS The namespace "init" process
    The first process created in a new namespace
    (i.e., the process created using
    .BR clone (2)
    with the
    .BR CLONE_NEWPID
    flag, or the first child created by a process after a call to
    .BR unshare (2)
    using the
    .BR CLONE_NEWPID
    flag) has the PID 1, and is the "init" process for the namespace (see
    .BR init (1)).
    Children that are orphaned within the namespace will be reparented
    to this process rather than
    .BR init (1).
    
    If the "init" process of a PID namespace terminates,
    the kernel terminates all of the processes in the namespace via a
    .BR SIGKILL
    signal.
    This behavior reflects the fact that the "init" process
    is essential for the correct operation of a PID namespace.
    In this case, a subsequent
    .BR fork (2)
    into this PID namespace (e.g., from a process that has done a
    .BR setns (2)
    into the namespace using an open file descriptor for a
    .I /proc/[pid]/ns/pid
    file corresponding to a process that was in the namespace)
    will fail with the error
    .BR ENOMEM ;
    it is not possible to create a new processes in a PID namespace whose "init"
    process has terminated.
    
    Only signals for which the "init" process has established a signal handler
    can be sent to the "init" process by other members of the PID namespace.
    This restriction applies even to privileged processes,
    and prevents other members of the PID namespace from
    accidentally killing the "init" process.
    
    Likewise, a process in an ancestor namespace
    can\(emsubject to the usual permission checks described in
    .BR kill (2)\(emsend
    signals to the "init" process of a child PID namespace only
    if the "init" process has established a handler for that signal.
    (Within the handler, the
    .I siginfo_t
    .I si_pid
    field described in
    .BR sigaction (2)
    will be zero.)
    .B SIGKILL
    or
    .B SIGSTOP
    are treated exceptionally:
    these signals are forcibly delivered when sent from an ancestor PID namespace.
    Neither of these signals can be caught by the "init" process,
    and so will result in the usual actions associated with those signals
    (respectively, terminating and stopping the process).
    .\"
    .\" ============================================================
    .\"
    .SS Nesting PID namespaces
    PID namespaces can be nested:
    each PID namespace has a parent,
    except for the initial ("root") PID namespace.
    The parent of a PID namespace is the PID namespace of the process that
    created the namespace using
    .BR clone (2)
    or
    .BR unshare (2).
    PID namespaces thus form a tree,
    with all namespaces ultimately tracing their ancestry to the root namespace.
    
    A process is visible to other processes in its PID namespace,
    and to the processes in each direct ancestor PID namespace
    going back to the root PID namespace.
    In this context, "visible" means that one process
    can be the target of operations by another process using
    system calls that specify a process ID.
    Conversely, the processes in a child PID namespace can't see
    processes in the parent and further removed ancestor namespace.
    More succinctly: a process can see (e.g., send signals with
    .BR kill(2),
    set nice values with
    .BR setpriority (2),
    etc.) only processes contained in its own PID namespace
    and in descendants of that namespace.
    
    A process has one process ID in each of the layers of the PID
    namespace hierarchy in which is visible,
    and walking back though each direct ancestor namespace
    through to the root PID namespace.
    System calls that operate on process IDs always
    operate using the process ID that is visible in the
    PID namespace of the caller.
    A call to
    .BR getpid (2)
    always returns the PID associated with the namespace in which
    the process was created.
    
    Some processes in a PID namespace may have parents
    that are outside of the namespace.
    For example, the parent of the initial process in the namespace
    (i.e., the
    .BR init (1)
    process with PID 1) is necessarily in another namespace.
    Likewise, the direct children of a process that uses
    .BR setns (2)
    to cause its children to join a PID namespace are in a different
    PID namespace from the caller of
    .BR setns (2).
    Calls to
    .BR getppid (2)
    for such processes return 0.
    .\"
    .\" ============================================================
    .\"
    .SS setns(2) and unshare(2) semantics
    Calls to
    .BR setns (2)
    that specify a PID namespace file descriptor
    and calls to
    .BR unshare (2)
    with the
    .BR CLONE_NEWPID
    flag cause children subsequently created
    by the caller to be placed in a different PID namespace from the caller.
    These calls do not, however,
    change the PID namespace of the calling process,
    because doing so would change the caller's idea of its own PID
    (as reported by
    .BR getpid ()),
    which would break many applications and libraries.
    
    To put things another way:
    a process's PID namespace membership is determined when the process is created
    and cannot be changed thereafter.
    Among other things, this means that the parental relationship
    between processes mirrors the parental between PID namespaces:
    the parent of a process is either in the same namespace
    or resides in the immediate parent PID namespace.
    
    Every thread in a process must be in the same PID namespace.
    For this reason, the two following call sequences will fail:
    
    .nf
        unshare(CLONE_NEWPID);
        clone(..., CLONE_VM, ...);    /* Fails */
    
        setns(fd, CLONE_NEWPID);
        clone(..., CLONE_VM, ...);    /* Fails */
    .fi
    
    Because the above
    .BR unshare (2)
    and
    .BR setns (2)
    calls only change the PID namespace for created children, the
    .BR clone (2)
    calls necessarily put the new thread in a different PID namespace from
    the calling thread.
    .\"
    .\" ============================================================
    .\"
    .SS Miscellaneous
    After creating a new PID namespace,
    it is useful for the child to change its root directory
    and mount a new procfs instance at
    .I /proc
    so that tools such as
    .BR ps (1)
    work correctly.
    .\" mount -t proc proc /proc
    (If a new mount namespace is simultaneously created by including
    .BR CLONE_NEWNS
    in the
    .IR flags
    argument of
    .BR clone (2)
    or
    .BR unshare (2)),
    then it isn't necessary to change the root directory:
    a new procfs instance can be mounted directly over
    .IR /proc .)
    
    Calling
    .BR readlink (2)
    on the path
    .I /proc/self
    yields the process ID of the caller in the PID namespace of the procfs mount
    (i.e., the PID namespace of the process that mounted the procfs).
    
    When a process ID is passed over a UNIX domain socket to a
    process in a different PID namespace (see the description of
    .B SCM_CREDENTIALS
    in
    .BR unix (7)),
    it is translated into the corresponding PID value in
    the receiving process's PID namespace.
    .SH CONFORMING TO
    Namespaces are a Linux-specific feature.
    .SH SEE ALSO
    .BR unshare (1),
    .BR clone (2),
    .BR setns (2),
    .BR unshare (2),
    .BR proc (5),
    .BR credentials (7),
    .BR capabilities (7),
    .BR user_namespaces (7),
    .BR switch_root (8)
    
    [-- Attachment #3: Type: text/plain, Size: 205 bytes --]
    
    _______________________________________________
    Containers mailing list
    Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
    https://lists.linuxfoundation.org/mailman/listinfo/containers
    
    ^ permalink raw reply	[flat|nested] 31+ messages in thread

    end of thread, other threads:[~2014-08-20 23:38 UTC | newest]
    
    Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <CAKgNAki=mUYuu_Ewhe7sjCmo+Dq2Vr+FZCixqNRaadcvAxtpFw@mail.gmail.com>
         [not found] ` <CAKgNAki=mUYuu_Ewhe7sjCmo+Dq2Vr+FZCixqNRaadcvAxtpFw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-02-28 14:24   ` For review: pid_namespaces(7) man page Vasily Kulikov
    2013-02-28 15:24   ` Eric W. Biederman
    2013-03-01  4:01   ` Rob Landley
    2013-03-01  6:58     ` Eric W. Biederman
    2013-03-01  9:57     ` Michael Kerrisk (man-pages)
         [not found]       ` <CAKgNAkgVKnhRT1Lpq4a_UdBKB+tn6XmWSDF2QJXG0aSLtNH6dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-01 15:35         ` Eric W. Biederman
    2013-03-04  3:50         ` Rob Landley
    2013-03-04  4:03           ` Eric W. Biederman
         [not found]             ` <876217olp0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-04 12:48               ` Michael Kerrisk (man-pages)
    2013-03-04 19:27               ` Rob Landley
    2013-03-05  7:01                 ` Michael Kerrisk (man-pages)
    2013-03-04 12:50           ` Michael Kerrisk (man-pages)
         [not found]       ` <87wqtr3zg5.fsf@xmission.com>
         [not found]         ` <87wqtr3zg5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-04 12:46           ` Michael Kerrisk (man-pages)
         [not found]         ` <CAKgNAkjGD0FdQqpA+rYR=+Yc5uVPB8mE5JjCqy-5WS85cPsvng@mail.gmail.com>
         [not found]           ` <CAKgNAkjGD0FdQqpA+rYR=+Yc5uVPB8mE5JjCqy-5WS85cPsvng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-04 17:52             ` Eric W. Biederman
         [not found]               ` <87k3pnhx2k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-05  5:30                 ` Michael Kerrisk (man-pages)
         [not found]                   ` <CAKgNAkjYmvjMzC+nYqsjHf4bQn2ZwdE5wawoP2p32ZSo+0dfcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-05  6:23                     ` Michael Kerrisk (man-pages)
    2013-03-05  6:41                     ` Eric W. Biederman
         [not found]                   ` <87r4jucprp.fsf@xmission.com>
         [not found]                     ` <87r4jucprp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-05  8:37                       ` Michael Kerrisk (man-pages)
         [not found]                     ` <CAKgNAkgqE7owqsmD+9-9fZtzMQ76H53a+Aat0CH670jNTUfbFA@mail.gmail.com>
         [not found]                       ` <CAKgNAkgqE7owqsmD+9-9fZtzMQ76H53a+Aat0CH670jNTUfbFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-06  0:40                         ` Eric W. Biederman
         [not found]                       ` <87boax4axy.fsf@xmission.com>
         [not found]                         ` <87boax4axy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-07  8:20                           ` Michael Kerrisk (man-pages)
         [not found]                             ` <CAKgNAkgRs7kg5PsMrBDNO8_z=5L5zM7DmLgU8pNwT_ck4Hmvhw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-07  8:31                               ` Eric W. Biederman
    2013-03-06  1:58                 ` Rob Landley
    2013-03-06  2:23                   ` Eric W. Biederman
         [not found] ` <20130228142440.GA6328@cachalot>
    2013-03-01  8:03   ` Michael Kerrisk (man-pages)
         [not found]     ` <CAKgNAkjXAfq4RwtX1ELier+GLv0D5e9spM3Os3-oqSCXGqRqOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-01  8:36       ` Eric W. Biederman
         [not found]     ` <87fw0f5xfw.fsf@xmission.com>
         [not found]       ` <87fw0f5xfw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-01  8:53         ` Michael Kerrisk (man-pages)
         [not found] ` <87txowa2cm.fsf@xmission.com>
         [not found]   ` <87txowa2cm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-01  8:50     ` Michael Kerrisk (man-pages)
         [not found]   ` <CAKgNAkjxrbcpONCU4UdD0-cjXwbHr+YwkOR0H_aXp3CGB283Uw@mail.gmail.com>
         [not found]     ` <CAKgNAkjxrbcpONCU4UdD0-cjXwbHr+YwkOR0H_aXp3CGB283Uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2013-03-01  9:10       ` Eric W. Biederman
         [not found]         ` <877glr5vuo.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
    2013-03-01 10:20           ` Michael Kerrisk (man-pages)
    2014-08-20 23:38 Michael Kerrisk (man-pages)
      -- strict thread matches above, loose matches on Subject: below --
    2013-02-28 11:24 Michael Kerrisk (man-pages)
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox