linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Konstantin Khlebnikov
	<khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
Cc: "Serge Hallyn"
	<serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>,
	"Stéphane Graber"
	<stgraber-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	"Oleg Nesterov" <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	"Andrew Morton"
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"Linus Torvalds"
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [PATCH RFC] pidns: introduce syscall getvpid
Date: Wed, 16 Sep 2015 09:39:39 -0500	[thread overview]
Message-ID: <20150916143939.GA32226@mail.hallyn.com> (raw)
In-Reply-To: <55F91C3D.1040209-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>

On Wed, Sep 16, 2015 at 10:37:33AM +0300, Konstantin Khlebnikov wrote:
> On 15.09.2015 20:41, Serge Hallyn wrote:
> >Quoting Stéphane Graber (stgraber-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org):
> >>On Tue, Sep 15, 2015 at 06:01:38PM +0300, Konstantin Khlebnikov wrote:
> >>>On 15.09.2015 17:27, Eric W. Biederman wrote:
> >>>>Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org> writes:
> >>>>
> >>>>>pid_t getvpid(pid_t pid, pid_t source, pid_t target);
> >>>>>
> >>>>>This syscall converts pid from one pid-ns into pid in another pid-ns:
> >>>>>it takes @pid in namespace of @source task (zero for current) and
> >>>>>returns related pid in namespace of @target task (zero for current too).
> >>>>>If pid is unreachable from target pid-ns then it returns zero.
> >>>>
> >>>>This interface as presented is inherently racy.  It would be better
> >>>>if source and target were file descriptors referring to the namespaces
> >>>>you wish to translate between.
> >>>
> >>>Yep, it's racy. As well as any operation with non-child pids.
> >>>With file descriptors for source/target result will be racy anyway.
> >>>
> >>>>
> >>>>>Such conversion is required for interaction between processes from
> >>>>>different pid-namespaces. For example when system service talks with
> >>>>>client from isolated container via socket about task in container:
> >>>>
> >>>>Sockets are already supported.  At least the metadata of sockets is.
> >>>>
> >>>>Maybe we need this but I am not convinced of it's utility.
> >>>>
> >>>>What are you trying to do that motivates this?
> >>>
> >>>I'm working on hierarchical container management system which
> >>>allows to create and control nested sub-containers from containers
> >>>( https://github.com/yandex/porto ). Main server works in host and
> >>>have to interact with all levels of nested namespaces. This syscall
> >>>makes some operations much easier: server must remember only pid in
> >>>host pid namespace and convert it into right vpid on demand.
> >>
> >>Note that as Eric said earlier, sending a PID inside a ucred through a
> >>unix socket will have the pid translated.
> >>
> >>So while your solution certainly should be faster, you can already achieve
> >>what you want today by doing:
> >>
> >>== Translate PID in container to PID in host
> >>  - open a socket
> >>  - setns to container's pidns
> >>  - send ucred from that container containing the requested container PID
> >>  - host sees the host PID
> >>
> >>== Translate PID on host to PID in container
> >>  - open a socket
> >>  - setns to container's pidns
> >>  - send ucred from the host containing the request host PID
> >>    (send will fail if the host PID isn't part of that container)
> >>  - container sees the container PID
> >
> >In addition, since commit e4bc332451 : /proc/PID/status: show all sets of pid according to ns
> >we now also have 'NSpid' etc in /proc/$$/status.
> >
> 
> As I see this works perfectly only for converting host pid into virtual.
> 
> Backward conversion is troublesome: we have to scan all pids in host
> procfs and somehow filter tasks from container and its sub-pid-ns.
> Or I am missing something trivial?

Ah, no that doesn't help with this.

What Stéphane describes is what I've done in several projects.
Getting it right is however actually quite tricky.  I'm not
convinced it's at the level of "since you can do (sweep hands)
all this, we don't need a simple syscall to do it."

So I'd encourage you to resend using namespace inode fds for
source and target as Eric suggested.  We still may decide that
the syscall isn't needed, but it's a trivial change to your
patch and removes that race.  And I'm not convinced it's not
needed.

-serge

  parent reply	other threads:[~2015-09-16 14:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-15 12:09 [PATCH RFC] pidns: introduce syscall getvpid Konstantin Khlebnikov
2015-09-15 14:20 ` Oleg Nesterov
2015-09-15 14:27 ` Eric W. Biederman
     [not found]   ` <87h9mvg3kw.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-09-15 15:01     ` Konstantin Khlebnikov
     [not found]       ` <55F832D2.1070605-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
2015-09-15 15:17         ` Stéphane Graber
2015-09-15 15:51           ` Konstantin Khlebnikov
2015-09-15 17:41           ` Serge Hallyn
2015-09-16  7:37             ` Konstantin Khlebnikov
     [not found]               ` <55F91C3D.1040209-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
2015-09-16 14:39                 ` Serge E. Hallyn [this message]
     [not found]                   ` <20150916143939.GA32226-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-09-16 14:49                     ` Eric W. Biederman
     [not found]                       ` <87twquzag1.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-09-16 16:31                         ` Serge E. Hallyn
     [not found]                           ` <20150916163123.GA1039-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-09-21  2:49                             ` Chen Fan
     [not found]                               ` <55FF7043.5020701-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2015-09-21 14:22                                 ` Serge E. Hallyn
     [not found]                                   ` <20150921142222.GA24005-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-09-22  7:42                                     ` Konstantin Khlebnikov
     [not found]                                       ` <56010680.7000301-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
2015-09-22 21:00                                         ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150916143939.GA32226@mail.hallyn.com \
    --to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org \
    --cc=stgraber-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).