* [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field @ 2016-03-21 23:41 Serge E. Hallyn [not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-03-21 23:41 UTC (permalink / raw) To: tj-DgEjT+Ai2ygdnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml One practical problem I've found with cgroup namespaces is that there is no way to disambiguate between a cgroupfs mount which was done in a cgroup namespace, and a bind mount of a cgroupfs directory. So whether I do unshare --cgroup -- bash -c "mount -t cgroup -o freezer f /mnt; cat /proc/self/mountinfo" or whether I just mount --bind /sys/fs/cgroup/freezer/$(awk -F: '/freezer/ { print $3 }' /proc/self/cgroup) /mnt 'mount root' field (field 3) in /proc/self/mountinfo will show the same thing, the result of awk -F: '/freezer/ { print $3 }' /proc/self/cgroup. This patch adds a 'nsroot=' field to cgroup mountinfo entries, so that userspace can distinguish a mount made in a cgroup namespace from a bind mount from a cgroup subdirectory. Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> --- fs/kernfs/mount.c | 2 +- include/linux/kernfs.h | 3 ++- kernel/cgroup.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index b67dbcc..58f59fd 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) struct kernfs_syscall_ops *scops = root->syscall_ops; if (scops && scops->show_options) - return scops->show_options(sf, root); + return scops->show_options(sf, dentry, root); return 0; } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index c06c442..3124b91 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -145,7 +145,8 @@ struct kernfs_node { */ struct kernfs_syscall_ops { int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); + int (*show_options)(struct seq_file *sf, struct dentry *dentry, + struct kernfs_root *root); int (*mkdir)(struct kernfs_node *parent, const char *name, umode_t mode); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 671dc05..806d1e7 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1593,7 +1593,32 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) return 0; } -static int cgroup_show_options(struct seq_file *seq, +static void cgroup_show_nsroot(struct seq_file *seq, struct dentry *dentry, + struct kernfs_root *kf_root) +{ + struct kernfs_node *d_kn = dentry->d_fsdata; + char *nsroot; + int len, ret; + + if (!kf_root) + return; + len = kernfs_path_from_node(d_kn, kf_root->kn, NULL, 0); + if (len <= 0) + return; + nsroot = kzalloc(len + 1, GFP_ATOMIC); + if (!nsroot) + return; + ret = kernfs_path_from_node(d_kn, kf_root->kn, nsroot, len + 1); + if (ret <= 0 || ret > len) + goto out; + + seq_show_option(seq, "nsroot", nsroot); + +out: + kfree(nsroot); +} + +static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry, struct kernfs_root *kf_root) { struct cgroup_root *root = cgroup_root_from_kf(kf_root); @@ -1619,6 +1644,8 @@ static int cgroup_show_options(struct seq_file *seq, seq_puts(seq, ",clone_children"); if (strlen(root->name)) seq_show_option(seq, "name", root->name); + cgroup_show_nsroot(seq, dentry, kf_root); + return 0; } -- 2.7.3 ^ permalink raw reply related [flat|nested] 22+ messages in thread
[parent not found: <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-03-29 1:12 ` Serge E. Hallyn [not found] ` <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-29 13:58 ` Tycho Andersen 2016-04-13 17:57 ` [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Tejun Heo 2 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-03-29 1:12 UTC (permalink / raw) To: Serge E. Hallyn Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Quoting Serge E. Hallyn (serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org): > One practical problem I've found with cgroup namespaces is that there > is no way to disambiguate between a cgroupfs mount which was done in > a cgroup namespace, and a bind mount of a cgroupfs directory. So > whether I do > > unshare --cgroup -- bash -c "mount -t cgroup -o freezer f /mnt; cat /proc/self/mountinfo" > > or whether I just > > mount --bind /sys/fs/cgroup/freezer/$(awk -F: '/freezer/ { print $3 }' /proc/self/cgroup) /mnt > > 'mount root' field (field 3) in /proc/self/mountinfo will show the > same thing, the result of awk -F: '/freezer/ { print $3 }' /proc/self/cgroup. > > This patch adds a 'nsroot=' field to cgroup mountinfo entries, so that > userspace can distinguish a mount made in a cgroup namespace from a bind > mount from a cgroup subdirectory. Hi Tejun, no rush on the patch itself, I don't mind if i have to rewrite it from scratch, but I'd like to get the patch into docker/libcontainer using it, so can we decide on whether the syntax as shown here is ok? thanks, -serge > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > --- > fs/kernfs/mount.c | 2 +- > include/linux/kernfs.h | 3 ++- > kernel/cgroup.c | 29 ++++++++++++++++++++++++++++- > 3 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > index b67dbcc..58f59fd 100644 > --- a/fs/kernfs/mount.c > +++ b/fs/kernfs/mount.c > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) > struct kernfs_syscall_ops *scops = root->syscall_ops; > > if (scops && scops->show_options) > - return scops->show_options(sf, root); > + return scops->show_options(sf, dentry, root); > return 0; > } > > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h > index c06c442..3124b91 100644 > --- a/include/linux/kernfs.h > +++ b/include/linux/kernfs.h > @@ -145,7 +145,8 @@ struct kernfs_node { > */ > struct kernfs_syscall_ops { > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > + int (*show_options)(struct seq_file *sf, struct dentry *dentry, > + struct kernfs_root *root); > > int (*mkdir)(struct kernfs_node *parent, const char *name, > umode_t mode); > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 671dc05..806d1e7 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -1593,7 +1593,32 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > return 0; > } > > -static int cgroup_show_options(struct seq_file *seq, > +static void cgroup_show_nsroot(struct seq_file *seq, struct dentry *dentry, > + struct kernfs_root *kf_root) > +{ > + struct kernfs_node *d_kn = dentry->d_fsdata; > + char *nsroot; > + int len, ret; > + > + if (!kf_root) > + return; > + len = kernfs_path_from_node(d_kn, kf_root->kn, NULL, 0); > + if (len <= 0) > + return; > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > + if (!nsroot) > + return; > + ret = kernfs_path_from_node(d_kn, kf_root->kn, nsroot, len + 1); > + if (ret <= 0 || ret > len) > + goto out; > + > + seq_show_option(seq, "nsroot", nsroot); > + > +out: > + kfree(nsroot); > +} > + > +static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry, > struct kernfs_root *kf_root) > { > struct cgroup_root *root = cgroup_root_from_kf(kf_root); > @@ -1619,6 +1644,8 @@ static int cgroup_show_options(struct seq_file *seq, > seq_puts(seq, ",clone_children"); > if (strlen(root->name)) > seq_show_option(seq, "name", root->name); > + cgroup_show_nsroot(seq, dentry, kf_root); > + > return 0; > } > > -- > 2.7.3 > ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-03-30 18:58 ` Tejun Heo 0 siblings, 0 replies; 22+ messages in thread From: Tejun Heo @ 2016-03-30 18:58 UTC (permalink / raw) To: Serge E. Hallyn Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Hello, Serge. On Mon, Mar 28, 2016 at 08:12:03PM -0500, Serge E. Hallyn wrote: > Quoting Serge E. Hallyn (serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org): > > One practical problem I've found with cgroup namespaces is that there > > is no way to disambiguate between a cgroupfs mount which was done in > > a cgroup namespace, and a bind mount of a cgroupfs directory. So > > whether I do > > > > unshare --cgroup -- bash -c "mount -t cgroup -o freezer f /mnt; cat /proc/self/mountinfo" > > > > or whether I just > > > > mount --bind /sys/fs/cgroup/freezer/$(awk -F: '/freezer/ { print $3 }' /proc/self/cgroup) /mnt > > > > 'mount root' field (field 3) in /proc/self/mountinfo will show the > > same thing, the result of awk -F: '/freezer/ { print $3 }' /proc/self/cgroup. > > > > This patch adds a 'nsroot=' field to cgroup mountinfo entries, so that > > userspace can distinguish a mount made in a cgroup namespace from a bind > > mount from a cgroup subdirectory. > > no rush on the patch itself, I don't mind if i have to rewrite it from > scratch, but I'd like to get the patch into docker/libcontainer using it, > so can we decide on whether the syntax as shown here is ok? Yeah, I think the syntax is fine. Thanks. -- tejun ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-29 1:12 ` Serge E. Hallyn @ 2016-03-29 13:58 ` Tycho Andersen 2016-03-29 20:00 ` Serge E. Hallyn 2016-04-13 17:57 ` [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Tejun Heo 2 siblings, 1 reply; 22+ messages in thread From: Tycho Andersen @ 2016-03-29 13:58 UTC (permalink / raw) To: Serge E. Hallyn Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Hi Serge, On Mon, Mar 21, 2016 at 06:41:33PM -0500, Serge E. Hallyn wrote: > One practical problem I've found with cgroup namespaces is that there > is no way to disambiguate between a cgroupfs mount which was done in > a cgroup namespace, and a bind mount of a cgroupfs directory. So > whether I do > > unshare --cgroup -- bash -c "mount -t cgroup -o freezer f /mnt; cat /proc/self/mountinfo" > > or whether I just > > mount --bind /sys/fs/cgroup/freezer/$(awk -F: '/freezer/ { print $3 }' /proc/self/cgroup) /mnt > > 'mount root' field (field 3) in /proc/self/mountinfo will show the > same thing, the result of awk -F: '/freezer/ { print $3 }' /proc/self/cgroup. > > This patch adds a 'nsroot=' field to cgroup mountinfo entries, so that > userspace can distinguish a mount made in a cgroup namespace from a bind > mount from a cgroup subdirectory. With this patch, mountinfo shows nsroot= in the mount options, but the actual mount() call for cgroups doesn't allow nsroot. Would it be possible to allow passing nsroot= to mount, as long is it does in fact match the current nsroot? The motivation for this is that CRIU just copies the mount options and uses them on restore, so with this patch we have to add a special case to trim off nsroot= before we restore. Tycho > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > --- > fs/kernfs/mount.c | 2 +- > include/linux/kernfs.h | 3 ++- > kernel/cgroup.c | 29 ++++++++++++++++++++++++++++- > 3 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > index b67dbcc..58f59fd 100644 > --- a/fs/kernfs/mount.c > +++ b/fs/kernfs/mount.c > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) > struct kernfs_syscall_ops *scops = root->syscall_ops; > > if (scops && scops->show_options) > - return scops->show_options(sf, root); > + return scops->show_options(sf, dentry, root); > return 0; > } > > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h > index c06c442..3124b91 100644 > --- a/include/linux/kernfs.h > +++ b/include/linux/kernfs.h > @@ -145,7 +145,8 @@ struct kernfs_node { > */ > struct kernfs_syscall_ops { > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > + int (*show_options)(struct seq_file *sf, struct dentry *dentry, > + struct kernfs_root *root); > > int (*mkdir)(struct kernfs_node *parent, const char *name, > umode_t mode); > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 671dc05..806d1e7 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -1593,7 +1593,32 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > return 0; > } > > -static int cgroup_show_options(struct seq_file *seq, > +static void cgroup_show_nsroot(struct seq_file *seq, struct dentry *dentry, > + struct kernfs_root *kf_root) > +{ > + struct kernfs_node *d_kn = dentry->d_fsdata; > + char *nsroot; > + int len, ret; > + > + if (!kf_root) > + return; > + len = kernfs_path_from_node(d_kn, kf_root->kn, NULL, 0); > + if (len <= 0) > + return; > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > + if (!nsroot) > + return; > + ret = kernfs_path_from_node(d_kn, kf_root->kn, nsroot, len + 1); > + if (ret <= 0 || ret > len) > + goto out; > + > + seq_show_option(seq, "nsroot", nsroot); > + > +out: > + kfree(nsroot); > +} > + > +static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry, > struct kernfs_root *kf_root) > { > struct cgroup_root *root = cgroup_root_from_kf(kf_root); > @@ -1619,6 +1644,8 @@ static int cgroup_show_options(struct seq_file *seq, > seq_puts(seq, ",clone_children"); > if (strlen(root->name)) > seq_show_option(seq, "name", root->name); > + cgroup_show_nsroot(seq, dentry, kf_root); > + > return 0; > } > > -- > 2.7.3 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field 2016-03-29 13:58 ` Tycho Andersen @ 2016-03-29 20:00 ` Serge E. Hallyn [not found] ` <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-03-29 20:00 UTC (permalink / raw) To: Tycho Andersen Cc: Serge E. Hallyn, tj-DgEjT+Ai2ygdnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Quoting Tycho Andersen (tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org): > Hi Serge, > > On Mon, Mar 21, 2016 at 06:41:33PM -0500, Serge E. Hallyn wrote: > > One practical problem I've found with cgroup namespaces is that there > > is no way to disambiguate between a cgroupfs mount which was done in > > a cgroup namespace, and a bind mount of a cgroupfs directory. So > > whether I do > > > > unshare --cgroup -- bash -c "mount -t cgroup -o freezer f /mnt; cat /proc/self/mountinfo" > > > > or whether I just > > > > mount --bind /sys/fs/cgroup/freezer/$(awk -F: '/freezer/ { print $3 }' /proc/self/cgroup) /mnt > > > > 'mount root' field (field 3) in /proc/self/mountinfo will show the > > same thing, the result of awk -F: '/freezer/ { print $3 }' /proc/self/cgroup. > > > > This patch adds a 'nsroot=' field to cgroup mountinfo entries, so that > > userspace can distinguish a mount made in a cgroup namespace from a bind > > mount from a cgroup subdirectory. > > With this patch, mountinfo shows nsroot= in the mount options, but the > actual mount() call for cgroups doesn't allow nsroot. Would it be > possible to allow passing nsroot= to mount, as long is it does in fact > match the current nsroot? Yeah, that should be possible. I'll try to send a patch for that later this week. That's not to say Tejun will be ok with the behavior, but it seems to make sense to me. > The motivation for this is that CRIU just copies the mount options and > uses them on restore, so with this patch we have to add a special case > to trim off nsroot= before we restore. > > Tycho > > > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > > --- > > fs/kernfs/mount.c | 2 +- > > include/linux/kernfs.h | 3 ++- > > kernel/cgroup.c | 29 ++++++++++++++++++++++++++++- > > 3 files changed, 31 insertions(+), 3 deletions(-) > > > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > > index b67dbcc..58f59fd 100644 > > --- a/fs/kernfs/mount.c > > +++ b/fs/kernfs/mount.c > > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) > > struct kernfs_syscall_ops *scops = root->syscall_ops; > > > > if (scops && scops->show_options) > > - return scops->show_options(sf, root); > > + return scops->show_options(sf, dentry, root); > > return 0; > > } > > > > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h > > index c06c442..3124b91 100644 > > --- a/include/linux/kernfs.h > > +++ b/include/linux/kernfs.h > > @@ -145,7 +145,8 @@ struct kernfs_node { > > */ > > struct kernfs_syscall_ops { > > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > > + int (*show_options)(struct seq_file *sf, struct dentry *dentry, > > + struct kernfs_root *root); > > > > int (*mkdir)(struct kernfs_node *parent, const char *name, > > umode_t mode); > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > > index 671dc05..806d1e7 100644 > > --- a/kernel/cgroup.c > > +++ b/kernel/cgroup.c > > @@ -1593,7 +1593,32 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > > return 0; > > } > > > > -static int cgroup_show_options(struct seq_file *seq, > > +static void cgroup_show_nsroot(struct seq_file *seq, struct dentry *dentry, > > + struct kernfs_root *kf_root) > > +{ > > + struct kernfs_node *d_kn = dentry->d_fsdata; > > + char *nsroot; > > + int len, ret; > > + > > + if (!kf_root) > > + return; > > + len = kernfs_path_from_node(d_kn, kf_root->kn, NULL, 0); > > + if (len <= 0) > > + return; > > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > > + if (!nsroot) > > + return; > > + ret = kernfs_path_from_node(d_kn, kf_root->kn, nsroot, len + 1); > > + if (ret <= 0 || ret > len) > > + goto out; > > + > > + seq_show_option(seq, "nsroot", nsroot); > > + > > +out: > > + kfree(nsroot); > > +} > > + > > +static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry, > > struct kernfs_root *kf_root) > > { > > struct cgroup_root *root = cgroup_root_from_kf(kf_root); > > @@ -1619,6 +1644,8 @@ static int cgroup_show_options(struct seq_file *seq, > > seq_puts(seq, ",clone_children"); > > if (strlen(root->name)) > > seq_show_option(seq, "name", root->name); > > + cgroup_show_nsroot(seq, dentry, kf_root); > > + > > return 0; > > } > > > > -- > > 2.7.3 > > ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* [PATCH] cgroup mount: ignore nsroot= [not found] ` <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-03-30 17:21 ` Serge E. Hallyn [not found] ` <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-03-30 17:21 UTC (permalink / raw) To: Serge E. Hallyn Cc: Tycho Andersen, tj-DgEjT+Ai2ygdnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml As of the patch "cgroup namespaces: add a 'nsroot=' mountinfo field", cgroupfs mountinfo output shows 'nsroot='. If userspace like criu copy/pastes mount options from there into a new mount command, we should ignore it. Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> --- kernel/cgroup.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index ef0c25d..69fb112 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1680,6 +1680,10 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts) opts->none = true; continue; } + if (!strncmp(token, "nsroot=", 7)) { + /* ignore nsroot= copied from mountinfo */ + continue; + } if (!strcmp(token, "all")) { /* Mutually exclusive option 'all' + subsystem name */ if (one_ss) -- 2.7.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
[parent not found: <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [PATCH] cgroup mount: ignore nsroot= [not found] ` <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-03-30 18:09 ` Tycho Andersen 0 siblings, 0 replies; 22+ messages in thread From: Tycho Andersen @ 2016-03-30 18:09 UTC (permalink / raw) To: Serge E. Hallyn Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml On Wed, Mar 30, 2016 at 12:21:00PM -0500, Serge E. Hallyn wrote: > As of the patch "cgroup namespaces: add a 'nsroot=' mountinfo field", > cgroupfs mountinfo output shows 'nsroot='. If userspace like criu > copy/pastes mount options from there into a new mount command, we should > ignore it. > > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> Tested-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Thanks, Serge. > --- > kernel/cgroup.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index ef0c25d..69fb112 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -1680,6 +1680,10 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts) > opts->none = true; > continue; > } > + if (!strncmp(token, "nsroot=", 7)) { > + /* ignore nsroot= copied from mountinfo */ > + continue; > + } > if (!strcmp(token, "all")) { > /* Mutually exclusive option 'all' + subsystem name */ > if (one_ss) > -- > 2.7.0 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-29 1:12 ` Serge E. Hallyn 2016-03-29 13:58 ` Tycho Andersen @ 2016-04-13 17:57 ` Tejun Heo [not found] ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2 siblings, 1 reply; 22+ messages in thread From: Tejun Heo @ 2016-04-13 17:57 UTC (permalink / raw) To: Serge E. Hallyn Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Hello, Serge. Sorry about the delay. On Mon, Mar 21, 2016 at 06:41:33PM -0500, Serge E. Hallyn wrote: > struct kernfs_syscall_ops { > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > + int (*show_options)(struct seq_file *sf, struct dentry *dentry, > + struct kernfs_root *root); Wouldn't it make more sense to pass in kernfs_node pointer instead of dentry pointer? > +static void cgroup_show_nsroot(struct seq_file *seq, struct dentry *dentry, > + struct kernfs_root *kf_root) > +{ > + struct kernfs_node *d_kn = dentry->d_fsdata; > + char *nsroot; > + int len, ret; > + > + if (!kf_root) > + return; > + len = kernfs_path_from_node(d_kn, kf_root->kn, NULL, 0); > + if (len <= 0) > + return; > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > + if (!nsroot) > + return; > + ret = kernfs_path_from_node(d_kn, kf_root->kn, nsroot, len + 1); > + if (ret <= 0 || ret > len) > + goto out; Hmmm.... does this mean that someone inside cgroup ns would be able to find out the absolute cgroup path of the ns root from inside? If so, wouldn't that be an unnecessary information leak? Thanks. -- tejun ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-04-13 18:46 ` Serge E. Hallyn [not found] ` <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-14 4:04 ` [PATCH] " Serge E. Hallyn 1 sibling, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-13 18:46 UTC (permalink / raw) To: Tejun Heo Cc: Serge E. Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org): > Hello, Serge. > > Sorry about the delay. > > On Mon, Mar 21, 2016 at 06:41:33PM -0500, Serge E. Hallyn wrote: > > struct kernfs_syscall_ops { > > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > > + int (*show_options)(struct seq_file *sf, struct dentry *dentry, > > + struct kernfs_root *root); > > Wouldn't it make more sense to pass in kernfs_node pointer instead of > dentry pointer? Yeah that definately seems better. > > +static void cgroup_show_nsroot(struct seq_file *seq, struct dentry *dentry, > > + struct kernfs_root *kf_root) > > +{ > > + struct kernfs_node *d_kn = dentry->d_fsdata; > > + char *nsroot; > > + int len, ret; > > + > > + if (!kf_root) > > + return; > > + len = kernfs_path_from_node(d_kn, kf_root->kn, NULL, 0); > > + if (len <= 0) > > + return; > > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > > + if (!nsroot) > > + return; > > + ret = kernfs_path_from_node(d_kn, kf_root->kn, nsroot, len + 1); > > + if (ret <= 0 || ret > len) > > + goto out; > > Hmmm.... does this mean that someone inside cgroup ns would be able to > find out the absolute cgroup path of the ns root from inside? If so, > wouldn't that be an unnecessary information leak? It's not a leak of any information we're trying to hide. I realize something like 8 years have passed, but I still basically go by the ksummit guidance that containers are ok but the kernel's first priority is to facilitate containers but not trick containers into thinking they're not containerized. So long as the container is properly set up, I don't think there's anything the workload could do with the nsroot= info other than *know* that it is in a ns cgroup. If we did change that guidance, there's a slew of proc info that we could better virtualize :) thanks, -serge ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-04-13 18:50 ` Tejun Heo [not found] ` <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Tejun Heo @ 2016-04-13 18:50 UTC (permalink / raw) To: Serge E. Hallyn Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Hello, Serge. On Wed, Apr 13, 2016 at 01:46:39PM -0500, Serge E. Hallyn wrote: > It's not a leak of any information we're trying to hide. I realize > something like 8 years have passed, but I still basically go by the > ksummit guidance that containers are ok but the kernel's first priority > is to facilitate containers but not trick containers into thinking > they're not containerized. So long as the container is properly set > up, I don't think there's anything the workload could do with the > nsroot= info other than *know* that it is in a ns cgroup. > > If we did change that guidance, there's a slew of proc info that we > could better virtualize :) I see. I'm just wondering because the information here seems a bit gratuituous. Isn't the only thing necessary telling whether the root is bind mounted or namescoped? Wouldn't simple "nsroot" work for that purpose? Thanks. -- tejun ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-04-13 19:01 ` Serge E. Hallyn [not found] ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-13 19:01 UTC (permalink / raw) To: Tejun Heo Cc: Serge E. Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org): > Hello, Serge. > > On Wed, Apr 13, 2016 at 01:46:39PM -0500, Serge E. Hallyn wrote: > > It's not a leak of any information we're trying to hide. I realize > > something like 8 years have passed, but I still basically go by the > > ksummit guidance that containers are ok but the kernel's first priority > > is to facilitate containers but not trick containers into thinking > > they're not containerized. So long as the container is properly set > > up, I don't think there's anything the workload could do with the > > nsroot= info other than *know* that it is in a ns cgroup. > > > > If we did change that guidance, there's a slew of proc info that we > > could better virtualize :) > > I see. I'm just wondering because the information here seems a bit > gratuituous. Isn't the only thing necessary telling whether the root > is bind mounted or namescoped? Wouldn't simple "nsroot" work for that > purpose? I don't think so - we could be in a cgroup namespace but still have access only to bind-mounted cgroups. So we need to compare the superblock dentry root field to the nsroot= value. ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-04-13 19:12 ` Tejun Heo 2016-04-13 23:31 ` Aditya Kali 1 sibling, 0 replies; 22+ messages in thread From: Tejun Heo @ 2016-04-13 19:12 UTC (permalink / raw) To: Serge E. Hallyn Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Hello, On Wed, Apr 13, 2016 at 02:01:52PM -0500, Serge E. Hallyn wrote: > I don't think so - we could be in a cgroup namespace but still have > access only to bind-mounted cgroups. So we need to compare the > superblock dentry root field to the nsroot= value. I see. No objection then. Thanks. -- tejun ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-13 19:12 ` Tejun Heo @ 2016-04-13 23:31 ` Aditya Kali [not found] ` <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 22+ messages in thread From: Aditya Kali @ 2016-04-13 23:31 UTC (permalink / raw) To: Serge E. Hallyn Cc: Tejun Heo, Linux API, Linux Containers, Eric W. Biederman, cgroups mailinglist, lkml On Wed, Apr 13, 2016 at 12:01 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org): >> Hello, Serge. >> >> On Wed, Apr 13, 2016 at 01:46:39PM -0500, Serge E. Hallyn wrote: >> > It's not a leak of any information we're trying to hide. I realize >> > something like 8 years have passed, but I still basically go by the >> > ksummit guidance that containers are ok but the kernel's first priority >> > is to facilitate containers but not trick containers into thinking >> > they're not containerized. So long as the container is properly set >> > up, I don't think there's anything the workload could do with the >> > nsroot= info other than *know* that it is in a ns cgroup. >> > >> > If we did change that guidance, there's a slew of proc info that we >> > could better virtualize :) >> >> I see. I'm just wondering because the information here seems a bit >> gratuituous. Isn't the only thing necessary telling whether the root >> is bind mounted or namescoped? Wouldn't simple "nsroot" work for that >> purpose? > > I don't think so - we could be in a cgroup namespace but still have > access only to bind-mounted cgroups. So we need to compare the > superblock dentry root field to the nsroot= value. Umm, I don't think this is such a good idea. The main purpose of cgroup namespace was to prevent this exposure of system cgroup hierarchy that used to happen because of /proc/self/cgroup. Wouldn't showing that information in /proc/self/mountinfo defeat the purpose? > One practical problem I've found with cgroup namespaces is that there > is no way to disambiguate between a cgroupfs mount which was done in > a cgroup namespace, and a bind mount of a cgroupfs directory. Thats actually by design, no? Namespaced apps should not know/care if they are running inside namespace. If they can find it out today, its just because of certain side-effects. I fear adding explicit "nsroot" or something in /proc/self/mountinfo now becomes an API making it hard to virtualize user-apps again. -- Aditya ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-04-13 23:52 ` Serge E. Hallyn 0 siblings, 0 replies; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-13 23:52 UTC (permalink / raw) To: Aditya Kali Cc: Serge E. Hallyn, Tejun Heo, Linux API, Linux Containers, Eric W. Biederman, cgroups mailinglist, lkml Quoting Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org): > On Wed, Apr 13, 2016 at 12:01 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > > Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org): > >> Hello, Serge. > >> > >> On Wed, Apr 13, 2016 at 01:46:39PM -0500, Serge E. Hallyn wrote: > >> > It's not a leak of any information we're trying to hide. I realize > >> > something like 8 years have passed, but I still basically go by the > >> > ksummit guidance that containers are ok but the kernel's first priority > >> > is to facilitate containers but not trick containers into thinking > >> > they're not containerized. So long as the container is properly set > >> > up, I don't think there's anything the workload could do with the > >> > nsroot= info other than *know* that it is in a ns cgroup. > >> > > >> > If we did change that guidance, there's a slew of proc info that we > >> > could better virtualize :) > >> > >> I see. I'm just wondering because the information here seems a bit > >> gratuituous. Isn't the only thing necessary telling whether the root > >> is bind mounted or namescoped? Wouldn't simple "nsroot" work for that > >> purpose? > > > > I don't think so - we could be in a cgroup namespace but still have > > access only to bind-mounted cgroups. So we need to compare the > > superblock dentry root field to the nsroot= value. > > Umm, I don't think this is such a good idea. The main purpose of > cgroup namespace was to prevent this exposure of system cgroup > hierarchy that used to happen because of /proc/self/cgroup. Wouldn't > showing that information in /proc/self/mountinfo defeat the purpose? I disagree. The primary purpose was to simplify init's job and to keep cgroup mounts in sync with /proc/self/cgroup. So that userspace doesn't have to look at /proc/self/cgroup and then try and figure out how that relates to its actual cgroup mountpoints. It was not to *hide* the information. Field 3 already gives us the path, nsroot just tells us what part of it we are namespaced under. > > One practical problem I've found with cgroup namespaces is that there > > is no way to disambiguate between a cgroupfs mount which was done in > > a cgroup namespace, and a bind mount of a cgroupfs directory. > > Thats actually by design, no? Namespaced apps should not know/care if > they are running inside namespace. If they can find it out today, its No. If a workload isn't allowed to mount its own cgroups, and can only see that freezer /lxc/x1 was mounted at /dev/cgroup (poorly done, but we don't get to pass judgement or choose mountpoints for userspace), and it sees /lxc/x1 in its freezer entry for /proc/self/cgroup, then it cannot tell whether it should be using /dev/cgroup/tasks or /dev/cgroup/lxc/tasks or /dev/cgroup/lxc/x1/tasks. That's a problem. > just because of certain side-effects. I fear adding explicit "nsroot" > or something in /proc/self/mountinfo now becomes an API making it hard > to virtualize user-apps again. It doesn't make it hard to virtualize. The only complication would be if you wanted to checkpoint/restart and reproduce the exact /proc/self/mountinfo output. That's a bogus goal anyway, since the restart could be in a different cgroup and field 3 would be different. In contrast, not providing this makes it impossible for software to deal with both cgroup namespace and any bind-mounted cgroups. Which means any new docker (say) which can run in cgroup namespaces will not be able to run under old (that is, anything currently released except lxc 2.0) container managers. We're breaking all container managers. Now the other thing we could do would be to tweak field 3 in the mountinfo output. That had been my first inclination, but the way the mountinfo code is currently done makes that ... challenging. -serge ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-04-13 18:46 ` Serge E. Hallyn @ 2016-04-14 4:04 ` Serge E. Hallyn [not found] ` <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 1 sibling, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-14 4:04 UTC (permalink / raw) To: Tejun Heo Cc: Serge E. Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, Eric W. Biederman, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml This is so that userspace can distinguish a mount made in a cgroup namespace from a bind mount from a cgroup subdirectory. Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> --- Changelog: 2016-04-13: pass kernfs_node rather than dentry to show_options --- fs/kernfs/mount.c | 2 +- include/linux/kernfs.h | 3 ++- kernel/cgroup.c | 28 +++++++++++++++++++++++++++- 3 files changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index f73541f..58e8a86 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) struct kernfs_syscall_ops *scops = root->syscall_ops; if (scops && scops->show_options) - return scops->show_options(sf, root); + return scops->show_options(sf, dentry->d_fsdata, root); return 0; } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index c06c442..72b4081 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -145,7 +145,8 @@ struct kernfs_node { */ struct kernfs_syscall_ops { int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); + int (*show_options)(struct seq_file *sf, struct kernfs_node *kn, + struct kernfs_root *root); int (*mkdir)(struct kernfs_node *parent, const char *name, umode_t mode); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 671dc05..4d26d07 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1593,7 +1593,31 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) return 0; } -static int cgroup_show_options(struct seq_file *seq, +static void cgroup_show_nsroot(struct seq_file *seq, struct kernfs_node *knode, + struct kernfs_root *kroot) +{ + char *nsroot; + int len, ret; + + if (!kroot) + return; + len = kernfs_path_from_node(knode, kroot->kn, NULL, 0); + if (len <= 0) + return; + nsroot = kzalloc(len + 1, GFP_ATOMIC); + if (!nsroot) + return; + ret = kernfs_path_from_node(knode, kroot->kn, nsroot, len + 1); + if (ret <= 0 || ret > len) + goto out; + + seq_show_option(seq, "nsroot", nsroot); + +out: + kfree(nsroot); +} + +static int cgroup_show_options(struct seq_file *seq, struct kernfs_node *kn, struct kernfs_root *kf_root) { struct cgroup_root *root = cgroup_root_from_kf(kf_root); @@ -1619,6 +1643,8 @@ static int cgroup_show_options(struct seq_file *seq, seq_puts(seq, ",clone_children"); if (strlen(root->name)) seq_show_option(seq, "name", root->name); + cgroup_show_nsroot(seq, kn, kf_root); + return 0; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
[parent not found: <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-04-14 14:42 ` Eric W. Biederman [not found] ` <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Eric W. Biederman @ 2016-04-14 14:42 UTC (permalink / raw) To: Serge E. Hallyn Cc: Tejun Heo, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > This is so that userspace can distinguish a mount made in a cgroup > namespace from a bind mount from a cgroup subdirectory. To do that do you need to print the path, or is an extra option that reveals nothing except that it was a cgroup mount sufficient? Is there any practical difference between a mount in a namespace and a bind mount? Given the way the conversation has been going I think it would be good to see the answers to these questions. Perhaps I missed it but I haven't seen the answers to those questions. Eric > > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > --- > Changelog: 2016-04-13: pass kernfs_node rather than dentry to show_options > --- > fs/kernfs/mount.c | 2 +- > include/linux/kernfs.h | 3 ++- > kernel/cgroup.c | 28 +++++++++++++++++++++++++++- > 3 files changed, 30 insertions(+), 3 deletions(-) > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > index f73541f..58e8a86 100644 > --- a/fs/kernfs/mount.c > +++ b/fs/kernfs/mount.c > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) > struct kernfs_syscall_ops *scops = root->syscall_ops; > > if (scops && scops->show_options) > - return scops->show_options(sf, root); > + return scops->show_options(sf, dentry->d_fsdata, root); > return 0; > } > > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h > index c06c442..72b4081 100644 > --- a/include/linux/kernfs.h > +++ b/include/linux/kernfs.h > @@ -145,7 +145,8 @@ struct kernfs_node { > */ > struct kernfs_syscall_ops { > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > + int (*show_options)(struct seq_file *sf, struct kernfs_node *kn, > + struct kernfs_root *root); > > int (*mkdir)(struct kernfs_node *parent, const char *name, > umode_t mode); > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 671dc05..4d26d07 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -1593,7 +1593,31 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > return 0; > } > > -static int cgroup_show_options(struct seq_file *seq, > +static void cgroup_show_nsroot(struct seq_file *seq, struct kernfs_node *knode, > + struct kernfs_root *kroot) > +{ > + char *nsroot; > + int len, ret; > + > + if (!kroot) > + return; > + len = kernfs_path_from_node(knode, kroot->kn, NULL, 0); > + if (len <= 0) > + return; > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > + if (!nsroot) > + return; > + ret = kernfs_path_from_node(knode, kroot->kn, nsroot, len + 1); > + if (ret <= 0 || ret > len) > + goto out; > + > + seq_show_option(seq, "nsroot", nsroot); > + > +out: > + kfree(nsroot); > +} > + > +static int cgroup_show_options(struct seq_file *seq, struct kernfs_node *kn, > struct kernfs_root *kf_root) > { > struct cgroup_root *root = cgroup_root_from_kf(kf_root); > @@ -1619,6 +1643,8 @@ static int cgroup_show_options(struct seq_file *seq, > seq_puts(seq, ",clone_children"); > if (strlen(root->name)) > seq_show_option(seq, "name", root->name); > + cgroup_show_nsroot(seq, kn, kf_root); > + > return 0; > } ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-04-14 15:27 ` Serge E. Hallyn [not found] ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-15 15:50 ` Aditya Kali 0 siblings, 2 replies; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-14 15:27 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Tejun Heo, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > > > This is so that userspace can distinguish a mount made in a cgroup > > namespace from a bind mount from a cgroup subdirectory. > > To do that do you need to print the path, or is an extra option that > reveals nothing except that it was a cgroup mount sufficient? > > Is there any practical difference between a mount in a namespace and a > bind mount? > > Given the way the conversation has been going I think it would be good > to see the answers to these questions. Perhaps I missed it but I > haven't seen the answers to those questions. Yup, I tried to answer those in my last email, let me try again. Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In that container, I start another container x1, not using cgroup namespaces. It also wants a cgroup mount, and a common way to handle that (to prevent container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task in that container will show '/lxc/x1'. Unless it has been moved into /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... Every time I've thought "maybe we can just..." I've found a case where it wouldn't work. At first in lxc we simply said if /proc/self/ns/cgroup exists assume that the cgroupfs mounts are not bind mounts. However, old userspace (and container drivers) on new kernels is certainly possible, especially an older distro in a container on a newer distro on the host. That completely breaks with this approach. I also personally think there *is* value in letting a task know its place on the system, so hiding the full cgroup path is imo not only not a valid goal, it's counter-productive. Part of making for better virtualization is to give userspace all the info it needs about its current limits. Consider that with the unified hierarchy, you cannot have tasks in a cgroup that also has child cgroups - except for the root. Cgroup namespaces do not make an exception for this, so knowing that you are not in the absolute cgroup root actually can prevent you from trying something that cannot work. Or, I suppose, at least understanding why you're unable to do what you're trying to do (namely your container manager messed up). I point this out because finding a way to only show the namespaced root in field 3 of mountinfo would fix the base problem, but at the cost of hiding useful information from a container. > Eric > > > > > > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > > --- > > Changelog: 2016-04-13: pass kernfs_node rather than dentry to show_options > > --- > > fs/kernfs/mount.c | 2 +- > > include/linux/kernfs.h | 3 ++- > > kernel/cgroup.c | 28 +++++++++++++++++++++++++++- > > 3 files changed, 30 insertions(+), 3 deletions(-) > > > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > > index f73541f..58e8a86 100644 > > --- a/fs/kernfs/mount.c > > +++ b/fs/kernfs/mount.c > > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) > > struct kernfs_syscall_ops *scops = root->syscall_ops; > > > > if (scops && scops->show_options) > > - return scops->show_options(sf, root); > > + return scops->show_options(sf, dentry->d_fsdata, root); > > return 0; > > } > > > > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h > > index c06c442..72b4081 100644 > > --- a/include/linux/kernfs.h > > +++ b/include/linux/kernfs.h > > @@ -145,7 +145,8 @@ struct kernfs_node { > > */ > > struct kernfs_syscall_ops { > > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); > > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); > > + int (*show_options)(struct seq_file *sf, struct kernfs_node *kn, > > + struct kernfs_root *root); > > > > int (*mkdir)(struct kernfs_node *parent, const char *name, > > umode_t mode); > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > > index 671dc05..4d26d07 100644 > > --- a/kernel/cgroup.c > > +++ b/kernel/cgroup.c > > @@ -1593,7 +1593,31 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > > return 0; > > } > > > > -static int cgroup_show_options(struct seq_file *seq, > > +static void cgroup_show_nsroot(struct seq_file *seq, struct kernfs_node *knode, > > + struct kernfs_root *kroot) > > +{ > > + char *nsroot; > > + int len, ret; > > + > > + if (!kroot) > > + return; > > + len = kernfs_path_from_node(knode, kroot->kn, NULL, 0); > > + if (len <= 0) > > + return; > > + nsroot = kzalloc(len + 1, GFP_ATOMIC); > > + if (!nsroot) > > + return; > > + ret = kernfs_path_from_node(knode, kroot->kn, nsroot, len + 1); > > + if (ret <= 0 || ret > len) > > + goto out; > > + > > + seq_show_option(seq, "nsroot", nsroot); > > + > > +out: > > + kfree(nsroot); > > +} > > + > > +static int cgroup_show_options(struct seq_file *seq, struct kernfs_node *kn, > > struct kernfs_root *kf_root) > > { > > struct cgroup_root *root = cgroup_root_from_kf(kf_root); > > @@ -1619,6 +1643,8 @@ static int cgroup_show_options(struct seq_file *seq, > > seq_puts(seq, ",clone_children"); > > if (strlen(root->name)) > > seq_show_option(seq, "name", root->name); > > + cgroup_show_nsroot(seq, kn, kf_root); > > + > > return 0; > > } ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-04-14 16:12 ` Eric W. Biederman [not found] ` <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Eric W. Biederman @ 2016-04-14 16:12 UTC (permalink / raw) To: Serge E. Hallyn Cc: Tejun Heo, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: >> >> > This is so that userspace can distinguish a mount made in a cgroup >> > namespace from a bind mount from a cgroup subdirectory. >> >> To do that do you need to print the path, or is an extra option that >> reveals nothing except that it was a cgroup mount sufficient? >> >> Is there any practical difference between a mount in a namespace and a >> bind mount? >> >> Given the way the conversation has been going I think it would be good >> to see the answers to these questions. Perhaps I missed it but I >> haven't seen the answers to those questions. > > Yup, I tried to answer those in my last email, let me try again. > > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In > that container, I start another container x1, not using cgroup namespaces. > It also wants a cgroup mount, and a common way to handle that (to prevent > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task > in that container will show '/lxc/x1'. Unless it has been moved into > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... > Every time I've thought "maybe we can just..." I've found a case where it > wouldn't work. > > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that > the cgroupfs mounts are not bind mounts. However, old userspace (and > container drivers) on new kernels is certainly possible, especially an > older distro in a container on a newer distro on the host. That completely > breaks with this approach. > > I also personally think there *is* value in letting a task know its > place on the system, so hiding the full cgroup path is imo not only not > a valid goal, it's counter-productive. Part of making for better > virtualization is to give userspace all the info it needs about its > current limits. Consider that with the unified hierarchy, you cannot > have tasks in a cgroup that also has child cgroups - except for the > root. Cgroup namespaces do not make an exception for this, so knowing > that you are not in the absolute cgroup root actually can prevent you > from trying something that cannot work. Or, I suppose, at least > understanding why you're unable to do what you're trying to do (namely > your container manager messed up). I point this out because finding > a way to only show the namespaced root in field 3 of mountinfo would > fix the base problem, but at the cost of hiding useful information > from a container. It is just the superblock show_path method. And regardless of the rest of the usefullness of your mount option implementing show_path appears to be fundamentally the right thing in this context. As that field appears to have the same issue as /proc/self/cgroup. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-04-14 16:38 ` Serge E. Hallyn 2016-04-14 16:43 ` Eric W. Biederman 0 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-14 16:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Tejun Heo, linux-api-u79uwXL29TY76Z2rM5mHXA, adityakali-hpIqsD4AKlfQT0dZR+AlfA, Linux Containers, cgroups-u79uwXL29TY76Z2rM5mHXA, lkml Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > > > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > >> > >> > This is so that userspace can distinguish a mount made in a cgroup > >> > namespace from a bind mount from a cgroup subdirectory. > >> > >> To do that do you need to print the path, or is an extra option that > >> reveals nothing except that it was a cgroup mount sufficient? > >> > >> Is there any practical difference between a mount in a namespace and a > >> bind mount? > >> > >> Given the way the conversation has been going I think it would be good > >> to see the answers to these questions. Perhaps I missed it but I > >> haven't seen the answers to those questions. > > > > Yup, I tried to answer those in my last email, let me try again. > > > > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts > > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, > > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In > > that container, I start another container x1, not using cgroup namespaces. > > It also wants a cgroup mount, and a common way to handle that (to prevent > > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, > > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from > > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. > > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, > > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task > > in that container will show '/lxc/x1'. Unless it has been moved into > > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... > > Every time I've thought "maybe we can just..." I've found a case where it > > wouldn't work. > > > > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that > > the cgroupfs mounts are not bind mounts. However, old userspace (and > > container drivers) on new kernels is certainly possible, especially an > > older distro in a container on a newer distro on the host. That completely > > breaks with this approach. > > > > I also personally think there *is* value in letting a task know its > > place on the system, so hiding the full cgroup path is imo not only not > > a valid goal, it's counter-productive. Part of making for better > > virtualization is to give userspace all the info it needs about its > > current limits. Consider that with the unified hierarchy, you cannot > > have tasks in a cgroup that also has child cgroups - except for the > > root. Cgroup namespaces do not make an exception for this, so knowing > > that you are not in the absolute cgroup root actually can prevent you > > from trying something that cannot work. Or, I suppose, at least > > understanding why you're unable to do what you're trying to do (namely > > your container manager messed up). I point this out because finding > > a way to only show the namespaced root in field 3 of mountinfo would > > fix the base problem, but at the cost of hiding useful information > > from a container. > > It is just the superblock show_path method. And regardless of the rest > of the usefullness of your mount option implementing show_path appears Ugh. Yeah as I've said implementing that would be the other way to go. I'm somewhat loath to give up the extra information, but I can work on that patch later this week. > to be fundamentally the right thing in this context. As that field > appears to have the same issue as /proc/self/cgroup. Well, /proc/self/cgroup could also have been fixed by adding a ':<nsroot>" field to each line, but it's used differently... thanks, -serge ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field 2016-04-14 16:38 ` Serge E. Hallyn @ 2016-04-14 16:43 ` Eric W. Biederman 0 siblings, 0 replies; 22+ messages in thread From: Eric W. Biederman @ 2016-04-14 16:43 UTC (permalink / raw) To: Serge E. Hallyn Cc: Tejun Heo, linux-api, adityakali, Linux Containers, cgroups, lkml "Serge E. Hallyn" <serge@hallyn.com> writes: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> "Serge E. Hallyn" <serge@hallyn.com> writes: >> >> > Quoting Eric W. Biederman (ebiederm@xmission.com): >> >> "Serge E. Hallyn" <serge@hallyn.com> writes: >> >> >> >> > This is so that userspace can distinguish a mount made in a cgroup >> >> > namespace from a bind mount from a cgroup subdirectory. >> >> >> >> To do that do you need to print the path, or is an extra option that >> >> reveals nothing except that it was a cgroup mount sufficient? >> >> >> >> Is there any practical difference between a mount in a namespace and a >> >> bind mount? >> >> >> >> Given the way the conversation has been going I think it would be good >> >> to see the answers to these questions. Perhaps I missed it but I >> >> haven't seen the answers to those questions. >> > >> > Yup, I tried to answer those in my last email, let me try again. >> > >> > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts >> > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, >> > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In >> > that container, I start another container x1, not using cgroup namespaces. >> > It also wants a cgroup mount, and a common way to handle that (to prevent >> > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, >> > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from >> > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. >> > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, >> > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task >> > in that container will show '/lxc/x1'. Unless it has been moved into >> > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... >> > Every time I've thought "maybe we can just..." I've found a case where it >> > wouldn't work. >> > >> > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that >> > the cgroupfs mounts are not bind mounts. However, old userspace (and >> > container drivers) on new kernels is certainly possible, especially an >> > older distro in a container on a newer distro on the host. That completely >> > breaks with this approach. >> > >> > I also personally think there *is* value in letting a task know its >> > place on the system, so hiding the full cgroup path is imo not only not >> > a valid goal, it's counter-productive. Part of making for better >> > virtualization is to give userspace all the info it needs about its >> > current limits. Consider that with the unified hierarchy, you cannot >> > have tasks in a cgroup that also has child cgroups - except for the >> > root. Cgroup namespaces do not make an exception for this, so knowing >> > that you are not in the absolute cgroup root actually can prevent you >> > from trying something that cannot work. Or, I suppose, at least >> > understanding why you're unable to do what you're trying to do (namely >> > your container manager messed up). I point this out because finding >> > a way to only show the namespaced root in field 3 of mountinfo would >> > fix the base problem, but at the cost of hiding useful information >> > from a container. >> >> It is just the superblock show_path method. And regardless of the rest >> of the usefullness of your mount option implementing show_path appears > > Ugh. Yeah as I've said implementing that would be the other way to go. > I'm somewhat loath to give up the extra information, but I can work > on that patch later this week. It sounded like you couldn't see how to implement that which is slightly different. >> to be fundamentally the right thing in this context. As that field >> appears to have the same issue as /proc/self/cgroup. > > Well, /proc/self/cgroup could also have been fixed by adding a > ':<nsroot>" field to each line, but it's used differently... Usage may be a reasonable justification. Mostly I am trying to pry apart the chaos. Not shoot down one approach or another. My current perspective is that irrespective of what we do with a informational mount option (and there seems to be value in that) having mountinfo show the path relative to root, will allow more software to just work without modification. And old software just working when it can seem very valuable. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field 2016-04-14 15:27 ` Serge E. Hallyn [not found] ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-04-15 15:50 ` Aditya Kali [not found] ` <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 22+ messages in thread From: Aditya Kali @ 2016-04-15 15:50 UTC (permalink / raw) To: Serge E. Hallyn Cc: Eric W. Biederman, Tejun Heo, Linux API, Linux Containers, cgroups mailinglist, lkml On Thu, Apr 14, 2016 at 8:27 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> "Serge E. Hallyn" <serge@hallyn.com> writes: >> >> > This is so that userspace can distinguish a mount made in a cgroup >> > namespace from a bind mount from a cgroup subdirectory. >> >> To do that do you need to print the path, or is an extra option that >> reveals nothing except that it was a cgroup mount sufficient? >> >> Is there any practical difference between a mount in a namespace and a >> bind mount? >> >> Given the way the conversation has been going I think it would be good >> to see the answers to these questions. Perhaps I missed it but I >> haven't seen the answers to those questions. > > Yup, I tried to answer those in my last email, let me try again. > > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In > that container, I start another container x1, not using cgroup namespaces. > It also wants a cgroup mount, and a common way to handle that (to prevent > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task > in that container will show '/lxc/x1'. Unless it has been moved into > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... > Every time I've thought "maybe we can just..." I've found a case where it > wouldn't work. > > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that > the cgroupfs mounts are not bind mounts. However, old userspace (and > container drivers) on new kernels is certainly possible, especially an > older distro in a container on a newer distro on the host. That completely > breaks with this approach. > My main concern regarding making this a new kernel API is that its too generic and exposes information about all system cgroups to every process on the system, not just the container or the process inside it that needs it. Not all containers need this information and not all processes running inside the container needs this. I haven't spent too much thought into it, but it seems you will still need to update the container userspace to read this extra mount option. So seems like a simpler approach where the host "cgroup manager" provides this information to specific container cgroup manager via other user-space channels (a config file, command-line args, environment vars, proper container mounts, etc.) may also work, right? > I also personally think there *is* value in letting a task know its > place on the system, so hiding the full cgroup path is imo not only not > a valid goal, it's counter-productive. Part of making for better > virtualization is to give userspace all the info it needs about its > current limits. Consider that with the unified hierarchy, you cannot > have tasks in a cgroup that also has child cgroups - except for the > root. Cgroup namespaces do not make an exception for this, so knowing > that you are not in the absolute cgroup root actually can prevent you > from trying something that cannot work. Or, I suppose, at least > understanding why you're unable to do what you're trying to do (namely > your container manager messed up). I point this out because finding > a way to only show the namespaced root in field 3 of mountinfo would > fix the base problem, but at the cost of hiding useful information > from a container. > >> Eric >> >> >> > >> > Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> >> > --- >> > Changelog: 2016-04-13: pass kernfs_node rather than dentry to show_options >> > --- >> > fs/kernfs/mount.c | 2 +- >> > include/linux/kernfs.h | 3 ++- >> > kernel/cgroup.c | 28 +++++++++++++++++++++++++++- >> > 3 files changed, 30 insertions(+), 3 deletions(-) >> > >> > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c >> > index f73541f..58e8a86 100644 >> > --- a/fs/kernfs/mount.c >> > +++ b/fs/kernfs/mount.c >> > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) >> > struct kernfs_syscall_ops *scops = root->syscall_ops; >> > >> > if (scops && scops->show_options) >> > - return scops->show_options(sf, root); >> > + return scops->show_options(sf, dentry->d_fsdata, root); >> > return 0; >> > } >> > >> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h >> > index c06c442..72b4081 100644 >> > --- a/include/linux/kernfs.h >> > +++ b/include/linux/kernfs.h >> > @@ -145,7 +145,8 @@ struct kernfs_node { >> > */ >> > struct kernfs_syscall_ops { >> > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data); >> > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root); >> > + int (*show_options)(struct seq_file *sf, struct kernfs_node *kn, >> > + struct kernfs_root *root); >> > >> > int (*mkdir)(struct kernfs_node *parent, const char *name, >> > umode_t mode); >> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c >> > index 671dc05..4d26d07 100644 >> > --- a/kernel/cgroup.c >> > +++ b/kernel/cgroup.c >> > @@ -1593,7 +1593,31 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) >> > return 0; >> > } >> > >> > -static int cgroup_show_options(struct seq_file *seq, >> > +static void cgroup_show_nsroot(struct seq_file *seq, struct kernfs_node *knode, >> > + struct kernfs_root *kroot) >> > +{ >> > + char *nsroot; >> > + int len, ret; >> > + >> > + if (!kroot) >> > + return; >> > + len = kernfs_path_from_node(knode, kroot->kn, NULL, 0); >> > + if (len <= 0) >> > + return; >> > + nsroot = kzalloc(len + 1, GFP_ATOMIC); >> > + if (!nsroot) >> > + return; >> > + ret = kernfs_path_from_node(knode, kroot->kn, nsroot, len + 1); >> > + if (ret <= 0 || ret > len) >> > + goto out; >> > + >> > + seq_show_option(seq, "nsroot", nsroot); >> > + >> > +out: >> > + kfree(nsroot); >> > +} >> > + >> > +static int cgroup_show_options(struct seq_file *seq, struct kernfs_node *kn, >> > struct kernfs_root *kf_root) >> > { >> > struct cgroup_root *root = cgroup_root_from_kf(kf_root); >> > @@ -1619,6 +1643,8 @@ static int cgroup_show_options(struct seq_file *seq, >> > seq_puts(seq, ",clone_children"); >> > if (strlen(root->name)) >> > seq_show_option(seq, "name", root->name); >> > + cgroup_show_nsroot(seq, kn, kf_root); >> > + >> > return 0; >> > } -- Aditya ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field [not found] ` <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-04-15 16:02 ` Serge E. Hallyn 0 siblings, 0 replies; 22+ messages in thread From: Serge E. Hallyn @ 2016-04-15 16:02 UTC (permalink / raw) To: Aditya Kali Cc: Serge E. Hallyn, Eric W. Biederman, Tejun Heo, Linux API, Linux Containers, cgroups mailinglist, lkml Quoting Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org): > On Thu, Apr 14, 2016 at 8:27 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > >> > >> > This is so that userspace can distinguish a mount made in a cgroup > >> > namespace from a bind mount from a cgroup subdirectory. > >> > >> To do that do you need to print the path, or is an extra option that > >> reveals nothing except that it was a cgroup mount sufficient? > >> > >> Is there any practical difference between a mount in a namespace and a > >> bind mount? > >> > >> Given the way the conversation has been going I think it would be good > >> to see the answers to these questions. Perhaps I missed it but I > >> haven't seen the answers to those questions. > > > > Yup, I tried to answer those in my last email, let me try again. > > > > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts > > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, > > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In > > that container, I start another container x1, not using cgroup namespaces. > > It also wants a cgroup mount, and a common way to handle that (to prevent > > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, > > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from > > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. > > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, > > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task > > in that container will show '/lxc/x1'. Unless it has been moved into > > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... > > Every time I've thought "maybe we can just..." I've found a case where it > > wouldn't work. > > > > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that > > the cgroupfs mounts are not bind mounts. However, old userspace (and > > container drivers) on new kernels is certainly possible, especially an > > older distro in a container on a newer distro on the host. That completely > > breaks with this approach. > > > > My main concern regarding making this a new kernel API is that its too > generic and exposes information about all system cgroups to every > process on the system, not just the container or the process inside it > that needs it. Not all containers need this information and not all > processes running inside the container needs this. I haven't spent too > much thought into it, but it seems you will still need to update the > container userspace to read this extra mount option. So seems like a > simpler approach where the host "cgroup manager" provides this > information to specific container cgroup manager via other user-space > channels (a config file, command-line args, environment vars, proper > container mounts, etc.) may also work, right? No, because existing legacy userspace would need to be taught about these new channels. I'm testing a new patch which simply fixes the root dentry field in mountinfo, which should also serve to fix this problem without adding the nsroot= option field. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-04-15 16:02 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-21 23:41 [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Serge E. Hallyn [not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-29 1:12 ` Serge E. Hallyn [not found] ` <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-30 18:58 ` Tejun Heo 2016-03-29 13:58 ` Tycho Andersen 2016-03-29 20:00 ` Serge E. Hallyn [not found] ` <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-30 17:21 ` [PATCH] cgroup mount: ignore nsroot= Serge E. Hallyn [not found] ` <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-03-30 18:09 ` Tycho Andersen 2016-04-13 17:57 ` [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Tejun Heo [not found] ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-04-13 18:46 ` Serge E. Hallyn [not found] ` <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-13 18:50 ` Tejun Heo [not found] ` <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-04-13 19:01 ` Serge E. Hallyn [not found] ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-13 19:12 ` Tejun Heo 2016-04-13 23:31 ` Aditya Kali [not found] ` <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-04-13 23:52 ` Serge E. Hallyn 2016-04-14 4:04 ` [PATCH] " Serge E. Hallyn [not found] ` <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-14 14:42 ` Eric W. Biederman [not found] ` <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-04-14 15:27 ` Serge E. Hallyn [not found] ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-04-14 16:12 ` Eric W. Biederman [not found] ` <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-04-14 16:38 ` Serge E. Hallyn 2016-04-14 16:43 ` Eric W. Biederman 2016-04-15 15:50 ` Aditya Kali [not found] ` <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-04-15 16:02 ` Serge E. Hallyn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).