* Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary [not found] ` <20200817174745.jssxjdcwoqxeg5pu@wittgenstein> @ 2020-08-17 18:53 ` Eric W. Biederman 0 siblings, 0 replies; 2+ messages in thread From: Eric W. Biederman @ 2020-08-17 18:53 UTC (permalink / raw) To: Christian Brauner Cc: Kirill Tkhai, Andrei Vagin, adobriyan, viro, davem, akpm, areber, serge, linux-kernel, linux-fsdevel, Pavel Tikhomirov, linux-api Christian Brauner <christian.brauner@ubuntu.com> writes: > On Mon, Aug 17, 2020 at 10:48:01AM -0500, Eric W. Biederman wrote: >> >> Creating names in the kernel for namespaces is very difficult and >> problematic. I have not seen anything that looks like all of the >> problems have been solved with restoring these new names. >> >> When your filter for your list of namespaces is user namespace creating >> a new directory in proc is highly questionable. >> >> As everyone uses proc placing this functionality in proc also amplifies >> the problem of creating names. >> >> >> Rather than proc having a way to mount a namespace filesystem filter by >> the user namespace of the mounter likely to have many many fewer >> problems. Especially as we are limiting/not allow new non-process >> things and ideally finding a way to remove the non-process things. >> >> >> Kirill you have a good point that taking the case where a pid namespace >> does not exist in a user namespace is likely quite unrealistic. >> >> Kirill mentioned upthread that the list of namespaces are the list that >> can appear in a container. Except by discipline in creating containers >> it is not possible to know which namespaces may appear in attached to a >> process. It is possible to be very creative with setns, and violate any >> constraint you may have. Which means your filtered list of namespaces >> may not contain all of the namespaces used by a set of processes. This > > Indeed. We use setns() quite creatively when intercepting syscalls and > when attaching to a container. > >> further argues that attaching the list of namespaces to proc does not >> make sense. >> >> Andrei has a good point that placing the names in a hierarchy by >> user namespace has the potential to create more freedom when >> assigning names to namespaces, as it means the names for namespaces >> do not need to be globally unique, and while still allowing the names >> to stay the same. >> >> >> To recap the possibilities for names for namespaces that I have seen >> mentioned in this thread are: >> - Names per mount >> - Names per user namespace >> >> I personally suspect that names per mount are likely to be so flexibly >> they are confusing, while names per user namespace are likely to be >> rigid, possibly too rigid to use. >> >> It all depends upon how everything is used. I have yet to see a >> complete story of how these names will be generated and used. So I can >> not really judge. > > So I haven't fully understood either what the motivation for this > patchset is. > I can just speak to the use-case I had when I started prototyping > something similar: We needed a way to get a view on all namespaces > that exist on the system because we wanted a way to do namespace > debugging on a live system. This interface could've easily lived in > debugfs. The main point was that it should contain all namespaces. > Note, that it wasn't supposed to be a hierarchical format it was only > mean to list all namespaces and accessible to real root. > The interface here is way more flexible/complex and I haven't yet > figured out what exactly it is supposed to be used for. > >> >> >> Let me add another take on this idea that might give this work a path >> forward. If I were solving this I would explore giving nsfs directories >> per user namespace, and a way to mount it that exposed the directory of >> the mounters current user namespace (something like btrfs snapshots). >> >> Hmm. For the user namespace directory I think I would give it a file >> "ns" that can be opened to get a file handle on the user namespace. >> Plus a set of subdirectories "cgroup", "ipc", "mnt", "net", "pid", >> "user", "uts") for each type of namespace. In each directory I think >> I would just have a 64bit counter and each new entry I would assign the >> next number from that counter. >> >> The restore could either have the ability to rename files or simply the >> ability to bump the counter (like we do with pids) so the names of the >> namespaces can be restored. >> >> That winds up making a user namespace the namespace of namespaces, so >> I am not 100% about the idea. > > I think you're right that we need to understand better what the use-case > is. If I understand your suggestion correctly it wouldn't allow to show > nested user namespaces if the nsfs mount is per-user namespace. So what I was thinking is that we have the user namespace directories and that the mount code would perform a bind mount such that the directory that matches the mounters user namespace is the root directory. > Let me throw in a crazy idea: couldn't we just make the ioctl_ns() walk > a namespace hierarchy? For example, you could pass in a user namespace > fd and then you'd get back a struct with handles for fds for the > namespaces owned by that user namespace and then you could use > NS_GET_USERNS/NS_GET_PARENT to walk upwards from the user namespace fd > passed in initially and so on? Or something similar/simpler. This would > also decouple this from procfs somewhat. Hmm. That would remove the need to have names. We could just keep a list of the namespaces in creation order. Hopefully the CRIU folks could preserve that create order without too much trouble. Say with an ioctl NS_NEXT_CREATION which takes two fds, and returns a new file descriptor. The arguments would be the user namespace and -1 or the file descriptor last returned fro NS_NEXT_CREATION. Assuming that is not difficult for CRIU to restore that would be a very simple patch. Eric ^ permalink raw reply [flat|nested] 2+ messages in thread
[parent not found: <159611007271.535980.15362304262237658692.stgit@localhost.localdomain>]
[parent not found: <20200730130852.kyzam5rihehviaia@wittgenstein>]
* Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary [not found] ` <20200730130852.kyzam5rihehviaia@wittgenstein> @ 2020-07-30 13:38 ` Christian Brauner 0 siblings, 0 replies; 2+ messages in thread From: Christian Brauner @ 2020-07-30 13:38 UTC (permalink / raw) To: Kirill Tkhai Cc: viro, adobriyan, davem, ebiederm, akpm, areber, serge, linux-kernel, linux-fsdevel, linux-api [Cc: linux-api] On Thu, Jul 30, 2020 at 03:08:53PM +0200, Christian Brauner wrote: > On Thu, Jul 30, 2020 at 02:59:20PM +0300, Kirill Tkhai wrote: > > Currently, there is no a way to list or iterate all or subset of namespaces > > in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories, > > but some also may be as open files, which are not attached to a process. > > When a namespace open fd is sent over unix socket and then closed, it is > > impossible to know whether the namespace exists or not. > > > > Also, even if namespace is exposed as attached to a process or as open file, > > iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because > > this multiplies at tasks and fds number. > > > > This patchset introduces a new /proc/namespaces/ directory, which exposes > > subset of permitted namespaces in linear view: > > > > # ls /proc/namespaces/ -l > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'cgroup:[4026531835]' -> 'cgroup:[4026531835]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'ipc:[4026531839]' -> 'ipc:[4026531839]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531840]' -> 'mnt:[4026531840]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531861]' -> 'mnt:[4026531861]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532133]' -> 'mnt:[4026532133]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532134]' -> 'mnt:[4026532134]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532135]' -> 'mnt:[4026532135]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532136]' -> 'mnt:[4026532136]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'net:[4026531993]' -> 'net:[4026531993]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'pid:[4026531836]' -> 'pid:[4026531836]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'time:[4026531834]' -> 'time:[4026531834]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'user:[4026531837]' -> 'user:[4026531837]' > > lrwxrwxrwx 1 root root 0 Jul 29 16:50 'uts:[4026531838]' -> 'uts:[4026531838]' > > > > Namespace ns is exposed, in case of its user_ns is permitted from /proc's pid_ns. > > I.e., /proc is related to pid_ns, so in /proc/namespace we show only a ns, which is > > > > in_userns(pid_ns->user_ns, ns->user_ns). > > > > In case of ns is a user_ns: > > > > in_userns(pid_ns->user_ns, ns). > > > > The patchset follows this steps: > > > > 1)A generic counter in ns_common is introduced instead of separate > > counters for every ns type (net::count, uts_namespace::kref, > > user_namespace::count, etc). Patches [1-8]; > > 2)Patch [9] introduces IDR to link and iterate alive namespaces; > > 3)Patch [10] is refactoring; > > 4)Patch [11] actually adds /proc/namespace directory and fs methods; > > 5)Patches [12-23] make every namespace to use the added methods > > and to appear in /proc/namespace directory. > > > > This may be usefull to write effective debug utils (say, fast build > > of networks topology) and checkpoint/restore software. > > Kirill, > > Thanks for working on this! > We have a need for this functionality too for namespace introspection. > I actually had a prototype of this as well but mine was based on debugfs > but /proc/namespaces seems like a good place. ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-08-17 18:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2d65ca28-bcfa-b217-e201-09163640ebc2@virtuozzo.com>
[not found] ` <20200810173431.GA68662@gmail.com>
[not found] ` <33565447-9b97-a820-bc2c-a4ff53a7675a@virtuozzo.com>
[not found] ` <20200812175338.GA596568@gmail.com>
[not found] ` <8f3c9414-9efc-cc01-fb2a-4d83266e96b2@virtuozzo.com>
[not found] ` <20200814011649.GA611947@gmail.com>
[not found] ` <0af3f2fa-f2c3-fb7d-b57e-9c41fe94ca58@virtuozzo.com>
[not found] ` <20200814192102.GA786465@gmail.com>
[not found] ` <56ed1fb9-4f1f-3528-3f09-78478b9dfcf2@virtuozzo.com>
[not found] ` <87d03pb7f2.fsf@x220.int.ebiederm.org>
[not found] ` <20200817174745.jssxjdcwoqxeg5pu@wittgenstein>
2020-08-17 18:53 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Eric W. Biederman
[not found] <159611007271.535980.15362304262237658692.stgit@localhost.localdomain>
[not found] ` <20200730130852.kyzam5rihehviaia@wittgenstein>
2020-07-30 13:38 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox