From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f68.google.com ([74.125.82.68]:32774 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751145AbcGYOq3 (ORCPT ); Mon, 25 Jul 2016 10:46:29 -0400 Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces To: "Eric W. Biederman" References: <1468520419-28220-1-git-send-email-avagin@openvz.org> <20160721210650.GA10989@outlook.office365.com> <1515f5f2-5a49-fcab-61f4-8b627d3ba3e2@gmail.com> <87lh0pg8jx.fsf@x220.int.ebiederm.org> Cc: mtk.manpages@gmail.com, Andrey Vagin , Serge Hallyn , Andrew Vagin , "criu@openvz.org" , Linux API , Linux Containers , LKML , James Bottomley , linux-fsdevel , Alexander Viro From: "Michael Kerrisk (man-pages)" Message-ID: <44ca0e41-dc92-45b1-2a6c-c41a048a072d@gmail.com> Date: Mon, 25 Jul 2016 16:46:25 +0200 MIME-Version: 1.0 In-Reply-To: <87lh0pg8jx.fsf@x220.int.ebiederm.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Eric, On 07/25/2016 03:18 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > >> Hi Andrey, >> >> On 07/22/2016 08:25 PM, Andrey Vagin wrote: >>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages) >>> wrote: >>>> Hi Andrey, >>>> >>>> >>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote: >>>>> >>>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) >>>>> wrote: >>>>>> >>>>>> Hi Andrey, >>>>>> >>>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Could you add here an of the API in detail: what do these FDs refer to, >>>>>> and how do you use them to solve the use case? And could you you add >>>>>> that info to the commit messages please. >>>>> >>>>> >>>>> Hi Michael, >>>>> >>>>> A patch for man-pages is attached. It adds the following text to >>>>> namespaces(7). >>>>> >>>>> Since Linux 4.X, the following ioctl(2) calls are supported for names‐ >>>>> pace file descriptors. The correct syntax is: >>>>> >>>>> fd = ioctl(ns_fd, ioctl_type); >>>>> >>>>> where ioctl_type is one of the following: >>>>> >>>>> NS_GET_USERNS >>>>> Returns a file descriptor that refers to an owning user names‐ >>>>> pace. >>>>> >>>>> NS_GET_PARENT >>>>> Returns a file descriptor that refers to a parent namespace. >>>>> This ioctl(2) can be used for pid and user namespaces. For user >>>>> namespaces, NS_GET_PARENT and NS_GET_USERNS have the same mean‐ >>>>> ing. >> >> For each of the above, I think it is worth mentioning that the >> close-on-exec flag is set for the returned file descriptor. > > Hmm. That is an odd default. Why do you say that? It's pretty common as the default for various APIs that create new FDs these days. (There's of course a strong argument that the original UNIX default was a design blunder...) >>>>> >>>>> In addition to generic ioctl(2) errors, the following specific ones can >>>>> occur: >>>>> >>>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. >>>>> >>>>> EPERM The requested namespace is outside of the current namespace >>>>> scope. >> >> Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial >> user namespace"? > > Having looked at that bit of code I don't think capabilities really > have a role to play. Yes, I caught up with that now. I await to see how this plays out in the next patch version. >>>>> ENOENT ns_fd refers to the init namespace. >>>> >>>> >>>> Thanks for this. But still part of the question remains unanswered. >>>> How do we (in user-space) use the file descriptors to answer any of >>>> the questions that this patch series was designed to solve? (This >>>> info should be in the commit message and the man-pages patch.) >>> >>> I'm sorry, but I am not sure that I understand what you ask. >>> >>> Here are the origin questions: >>> Someone else then asked me a question that led me to wonder about >>> generally introspecting on the parental relationships between user >>> namespaces and the association of other namespaces types with user >>> namespaces. One use would be visualization, in order to understand the >>> running system. Another would be to answer the question I already >>> mentioned: what capability does process X have to perform operations >>> on a resource governed by namespace Y? >>> >>> Here is an example which shows how we can get the owning namespace >>> inode number by using these ioctl-s. >>> >>> $ ls -l /proc/13929/ns/pid >>> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]' >>> >>> $ ./nsowner /proc/13929/ns/pid >>> user:[4026532227] >>> >>> The owning user namespace for pid:[4026532228] is user:[4026532227]. >>> >>> The nsowner tool is cimpiled from this code: >>> >>> int main(int argc, char *argv[]) >>> { >>> char buf[128], path[] = "/proc/self/fd/0123456789"; >>> int ns, uns, ret; >>> >>> ns = open(argv[1], O_RDONLY); >>> if (ns < 0) >>> return 1; >>> >>> uns = ioctl(ns, NS_GET_USERNS); >>> if (uns < 0) >>> return 1; >>> >>> snprintf(path, sizeof(path), "/proc/self/fd/%d", uns); >>> ret = readlink(path, buf, sizeof(buf) - 1); >>> if (ret < 0) >>> return 1; >>> buf[ret] = 0; >>> >>> printf("%s\n", buf); >>> >>> return 0; >>> } >> >> So, from my point of view, the important piece that was missing from >> your commit message was the note to use readlink("/proc/self/fd/%d") >> on the returned FDs. I think that detail needs to be part of the >> commit message (and also the man page text). I think it even be >> helpful to include the above program as part of the commit message: >> it helps people more quickly grasp the API. > > Please, please make the standard way to compare these things fstat. > That is much less magic than a symlink, and a little more future proof. > Possibly even kcmp. As in fstat() to get the st_ino field, right? Cheers, Michael > At some point we will care about migrating a migrating sub-container and we > may have to have some minor changes. > > Eric > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/