* [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts
@ 2008-08-21 2:21 sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 51+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:21 UTC (permalink / raw)
To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io
Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
xemul-GEFAQzZX7r8dnm+yROfE0A
I have been trying to address comments from the last patchset and
here is another snapshot of the patches.
Appreciate any feedback on the single/multi-mount semantics (patch 8/8).
See below for changelog and pending stuff
---
Enable multiple mounts of devpts filesystem so each container can allocate
ptys independently.
User interface:
Supporting multiple mounts of devpts can break user-space, this
feature is enabled only under a new mount option (-o newmnt).
If this option is never specified, this patchset should not change
any behavior. i.e existing single-namespace semantics are preserved
across any remounts of /dev/pts (as long as none of them use the new
option). So old startup scripts should continue to work.
If the '-o newmnt' option is specified, then a new 'private' mount
of devpts is created. ptys in this private instance are independent
of ptys created in other devpts instances. For this to be fully
functionaly, /dev/ptmx must be a symbolic link to '/dev/pts/ptmx'
Implementation notes:
1. To enable multiple mounts of /dev/pts, (most) devpts interfaces
need to know which instance of devpts is being accessed. This
patchset uses the 'struct inode' of the device being accessed
to identify the appropriate devpts instance. Hence the need for
the /dev/pts/ptmx symlink.
2. See comments in get_sb_ref() in fs/super.c (could not find
existing interfaces that accomplish it) Or is there a better
way ?
[PATCH 1/8]: /dev/tty tweak in init_dev()
[PATCH 2/8]: Add inode parameter to devpts interfaces
[PATCH 3/8]: Remove devpts_root global
[PATCH 4/8]: Per-mount allocated_ptys
[PATCH 5/8]: Per-mount config object
[PATCH 6/8]: Extract option parsing to new function
[PATCH 7/8]: Auto-create ptmx node when mounting devpts
[PATCH 8/8]: Enable multiple mounts of devpts
Changelog [v2]:
- New mount option '-o newmnt' added (patch 8/8)
- Support both single-mount and multi-mount semantics (patch 8/8)
- Automatically create ptmx node when devpts is mounted (patch 7/8)
- Extract option parsing code to new function (patch 6/8)
- Make 'config' params per-mount variables (patch 5/8)
- Slightly re-ordered existing patches in set (patches 1/8, 2/8)
TODO:
- Add a config option to enable multiple-mounts of devpts.
- Remove even initial kernel mount of devpts ? (If we do, how
do we preserve single-mount semantics) ?
- Testing :-), check for bisect-safe etc
^ permalink raw reply [flat|nested] 51+ messages in thread[parent not found: <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 2:26 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [not found] ` <20080821022621.GA29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 2:26 ` [RFC][PATCH 2/8]: Add inode parameter devpts interfaces sukadev-r/Jw6+rmf7HQT0dZR+AlfA ` (7 subsequent siblings) 8 siblings, 1 reply; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:26 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() When opening /dev/tty, __tty_open() finds the tty using get_current_tty(). When __tty_open() calls init_dev() it passes in this tty in '*ret_tty'. init_dev() ignores this and asks devpts again for the tty. Is that really necessary ? The problem with asking devpts again is that with multiple mounts, devpts cannot find the tty without knowing the specific mount instance. We can't find the mount instance of devpts, since the inode of /dev/tty is in a different filesystem. Changelog:[v2]: - minor change due to moving patch up in patchset - added comments TODO: Looks ugly to use '*ret_tty' this way. Rename param or call get_current_tty() ? --- drivers/char/tty_io.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) Index: linux-2.6.26-rc8-mm1/drivers/char/tty_io.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/drivers/char/tty_io.c 2008-08-20 12:57:51.000000000 -0700 +++ linux-2.6.26-rc8-mm1/drivers/char/tty_io.c 2008-08-20 12:57:53.000000000 -0700 @@ -2066,7 +2066,17 @@ static int init_dev(struct tty_driver *d /* check whether we're reopening an existing tty */ if (driver->flags & TTY_DRIVER_DEVPTS_MEM) { - tty = devpts_get_tty(idx); + /* + * For /dev/tty, caller already found the tty and saved it in + * *ret_tty. Avoid asking devpts for it since, with support + * for multiple mounts of devpts, we would need to know the + * mount instance. And we can't (easily ?) find the mount + * instance from the inode of /dev/tty. + */ + if (inode->i_rdev == MKDEV(TTYAUX_MAJOR, 0)) + tty = *ret_tty; + else + tty = devpts_get_tty(idx); /* * If we don't have a tty here on a slave open, it's because * the master already started the close process and there's ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821022621.GA29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() [not found] ` <20080821022621.GA29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 9:19 ` Alan Cox 2008-08-21 9:26 ` Alan Cox 1 sibling, 0 replies; 51+ messages in thread From: Alan Cox @ 2008-08-21 9:19 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, hpa-YMNOUZJC4hwAvxtiuMwx3w, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w On Wed, 20 Aug 2008 19:26:21 -0700 sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: > > From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > Subject: [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() > > When opening /dev/tty, __tty_open() finds the tty using get_current_tty(). > When __tty_open() calls init_dev() it passes in this tty in '*ret_tty'. > init_dev() ignores this and asks devpts again for the tty. Is that really > necessary ? NAK This code has all changed. Please take a look at -next and expect it to continue to do so further. Patches against -next are welcome and there is also more room to rework stuff in -next as we are making big changes anyway. Also why not make use of tty->driver_data. It is there for drivers to use. Alan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() [not found] ` <20080821022621.GA29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 9:19 ` Alan Cox @ 2008-08-21 9:26 ` Alan Cox 1 sibling, 0 replies; 51+ messages in thread From: Alan Cox @ 2008-08-21 9:26 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, hpa-YMNOUZJC4hwAvxtiuMwx3w, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w Summary from the tty side: - tty changes as proposed are an ugly hack - I'm opposed to magically producing device nodes that were not there before - the pty driver doesn't use ->driver_data which is probably what you want for some of the back walking - tty and pty code in this area is currently in flux - in the -next tree take a look at tty_kref_get and friends, passing kref bumped objects back may also help get a tidier result I'm currently trying to extract out all the pty nasty special case crap from tty_io.c which should actually help get this right. I'd also rather see big changes to the tty open/close logic where neccessary in favour of further hacks on hacks. Alan ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 2/8]: Add inode parameter devpts interfaces [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 2:26 ` [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:26 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:27 ` [RFC][PATCH 3/8]: Remove devpts_root global sukadev-r/Jw6+rmf7HQT0dZR+AlfA ` (6 subsequent siblings) 8 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:26 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 2/8]: Add inode parameter devpts interfaces Pass-in an 'inode' parameter to devpts interfaces. The parameter itself will be used in subsequent patches to identify the instance of devpts mounted. --- drivers/char/pty.c | 3 ++- drivers/char/tty_io.c | 21 +++++++++++---------- fs/devpts/inode.c | 10 +++++----- include/linux/devpts_fs.h | 34 ++++++++++++++++++++++++---------- 4 files changed, 42 insertions(+), 26 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 12:33:21.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 12:34:38.000000000 -0700 @@ -177,7 +177,7 @@ static struct dentry *get_node(int num) return lookup_one_len(s, root, sprintf(s, "%d", num)); } -int devpts_new_index(void) +int devpts_new_index(struct inode *inode) { int index; int idr_ret; @@ -205,14 +205,14 @@ retry: return index; } -void devpts_kill_index(int idx) +void devpts_kill_index(struct inode *inode, int idx) { mutex_lock(&allocated_ptys_lock); idr_remove(&allocated_ptys, idx); mutex_unlock(&allocated_ptys_lock); } -int devpts_pty_new(struct tty_struct *tty) +int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty) { int number = tty->index; /* tty layer puts index from devpts_new_index() in here */ struct tty_driver *driver = tty->driver; @@ -245,7 +245,7 @@ int devpts_pty_new(struct tty_struct *tt return 0; } -struct tty_struct *devpts_get_tty(int number) +struct tty_struct *devpts_get_tty(struct inode *inode, int number) { struct dentry *dentry = get_node(number); struct tty_struct *tty; @@ -262,7 +262,7 @@ struct tty_struct *devpts_get_tty(int nu return tty; } -void devpts_pty_kill(int number) +void devpts_pty_kill(struct inode *inode, int number) { struct dentry *dentry = get_node(number); Index: linux-2.6.26-rc8-mm1/include/linux/devpts_fs.h =================================================================== --- linux-2.6.26-rc8-mm1.orig/include/linux/devpts_fs.h 2008-08-20 12:33:21.000000000 -0700 +++ linux-2.6.26-rc8-mm1/include/linux/devpts_fs.h 2008-08-20 12:34:38.000000000 -0700 @@ -17,20 +17,34 @@ #ifdef CONFIG_UNIX98_PTYS -int devpts_new_index(void); -void devpts_kill_index(int idx); -int devpts_pty_new(struct tty_struct *tty); /* mknod in devpts */ -struct tty_struct *devpts_get_tty(int number); /* get tty structure */ -void devpts_pty_kill(int number); /* unlink */ +int devpts_new_index(struct inode *inode); +void devpts_kill_index(struct inode *inode, int idx); + +/* mknod in devpts */ +int devpts_pty_new(struct inode *inode, struct tty_struct *tty); + +/* get tty structure */ +struct tty_struct *devpts_get_tty(struct inode *inode, int number); + +/* unlink */ +void devpts_pty_kill(struct inode *inode, int number); #else /* Dummy stubs in the no-pty case */ -static inline int devpts_new_index(void) { return -EINVAL; } -static inline void devpts_kill_index(int idx) { } -static inline int devpts_pty_new(struct tty_struct *tty) { return -EINVAL; } -static inline struct tty_struct *devpts_get_tty(int number) { return NULL; } -static inline void devpts_pty_kill(int number) { } +static inline int devpts_new_index(struct inode *inode) { return -EINVAL; } +static inline void devpts_kill_index(struct inode *inode, int idx) { } + +static inline int devpts_pty_new(struc inode *inode, struct tty_struct *tty) +{ + return -EINVAL; +} + +static inline struct tty_struct *devpts_get_tty(struct inode *inode, int number) +{ + return NULL; +} +static inline void devpts_pty_kill(struc inode *inode, int number) { } #endif Index: linux-2.6.26-rc8-mm1/drivers/char/pty.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/drivers/char/pty.c 2008-08-20 12:33:21.000000000 -0700 +++ linux-2.6.26-rc8-mm1/drivers/char/pty.c 2008-08-20 12:34:38.000000000 -0700 @@ -59,7 +59,8 @@ static void pty_close(struct tty_struct set_bit(TTY_OTHER_CLOSED, &tty->flags); #ifdef CONFIG_UNIX98_PTYS if (tty->driver == ptm_driver) - devpts_pty_kill(tty->index); + devpts_pty_kill(filp->f_path.dentry->d_inode, + tty->index); #endif tty_vhangup(tty->link); } Index: linux-2.6.26-rc8-mm1/drivers/char/tty_io.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/drivers/char/tty_io.c 2008-08-20 12:34:36.000000000 -0700 +++ linux-2.6.26-rc8-mm1/drivers/char/tty_io.c 2008-08-20 12:34:38.000000000 -0700 @@ -2056,7 +2056,7 @@ static void tty_line_name(struct tty_dri * relaxed for the (most common) case of reopening a tty. */ -static int init_dev(struct tty_driver *driver, int idx, +static int init_dev(struct tty_driver *driver, struct inode *inode, int idx, struct tty_struct **ret_tty) { struct tty_struct *tty, *o_tty; @@ -2076,7 +2076,7 @@ static int init_dev(struct tty_driver *d if (inode->i_rdev == MKDEV(TTYAUX_MAJOR, 0)) tty = *ret_tty; else - tty = devpts_get_tty(idx); + tty = devpts_get_tty(inode, idx); /* * If we don't have a tty here on a slave open, it's because * the master already started the close process and there's @@ -2380,10 +2380,11 @@ static void release_dev(struct file *fil int idx; char buf[64]; unsigned long flags; + struct inode *inode; + inode = filp->f_path.dentry->d_inode; tty = (struct tty_struct *)filp->private_data; - if (tty_paranoia_check(tty, filp->f_path.dentry->d_inode, - "release_dev")) + if (tty_paranoia_check(tty, inode, "release_dev")) return; check_tty_count(tty, "release_dev"); @@ -2638,7 +2639,7 @@ static void release_dev(struct file *fil /* Make this pty number available for reallocation */ if (devpts) - devpts_kill_index(idx); + devpts_kill_index(inode, idx); } /** @@ -2719,7 +2720,7 @@ retry_open: return -ENODEV; } got_driver: - retval = init_dev(driver, index, &tty); + retval = init_dev(driver, inode, index, &tty); mutex_unlock(&tty_mutex); if (retval) return retval; @@ -2811,12 +2812,12 @@ static int __ptmx_open(struct inode *ino nonseekable_open(inode, filp); /* find a device that is not in use. */ - index = devpts_new_index(); + index = devpts_new_index(inode); if (index < 0) return index; mutex_lock(&tty_mutex); - retval = init_dev(ptm_driver, index, &tty); + retval = init_dev(ptm_driver, inode, index, &tty); mutex_unlock(&tty_mutex); if (retval) @@ -2826,7 +2827,7 @@ static int __ptmx_open(struct inode *ino filp->private_data = tty; file_move(filp, &tty->tty_files); - retval = devpts_pty_new(tty->link); + retval = devpts_pty_new(inode, tty->link); if (retval) goto out1; @@ -2838,7 +2839,7 @@ out1: release_dev(filp); return retval; out: - devpts_kill_index(index); + devpts_kill_index(inode, index); return retval; } ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 3/8]: Remove devpts_root global [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 2:26 ` [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:26 ` [RFC][PATCH 2/8]: Add inode parameter devpts interfaces sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:27 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:27 ` [RFC][PATCH 4/8]: Per-mount allocated_ptys sukadev-r/Jw6+rmf7HQT0dZR+AlfA ` (5 subsequent siblings) 8 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:27 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 3/8]: Remove devpts_root global Remove the 'devpts_root' global variable and find the root dentry using the super_block. The super-block can be found from the device inode, using the new wrapper, pts_sb_from_inode(). --- fs/devpts/inode.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 12:58:30.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 13:16:47.000000000 -0700 @@ -33,7 +33,6 @@ static DEFINE_IDR(allocated_ptys); static DEFINE_MUTEX(allocated_ptys_lock); static struct vfsmount *devpts_mnt; -static struct dentry *devpts_root; static struct { int setuid; @@ -55,6 +54,14 @@ static match_table_t tokens = { {Opt_err, NULL} }; +static inline struct super_block *pts_sb_from_inode(struct inode *inode) +{ + if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC) + return inode->i_sb; + + return devpts_mnt->mnt_sb; +} + static int devpts_remount(struct super_block *sb, int *flags, char *data) { char *p; @@ -141,7 +148,7 @@ devpts_fill_super(struct super_block *s, inode->i_fop = &simple_dir_operations; inode->i_nlink = 2; - devpts_root = s->s_root = d_alloc_root(inode); + s->s_root = d_alloc_root(inode); if (s->s_root) return 0; @@ -169,10 +176,9 @@ static struct file_system_type devpts_fs * to the System V naming convention */ -static struct dentry *get_node(int num) +static struct dentry *get_node(struct dentry *root, int num) { char s[12]; - struct dentry *root = devpts_root; mutex_lock(&root->d_inode->i_mutex); return lookup_one_len(s, root, sprintf(s, "%d", num)); } @@ -218,7 +224,9 @@ int devpts_pty_new(struct inode *ptmx_in struct tty_driver *driver = tty->driver; dev_t device = MKDEV(driver->major, driver->minor_start+number); struct dentry *dentry; - struct inode *inode = new_inode(devpts_mnt->mnt_sb); + struct super_block *sb = pts_sb_from_inode(ptmx_inode); + struct inode *inode = new_inode(sb); + struct dentry *root = sb->s_root; /* We're supposed to be given the slave end of a pty */ BUG_ON(driver->type != TTY_DRIVER_TYPE_PTY); @@ -234,20 +242,22 @@ int devpts_pty_new(struct inode *ptmx_in init_special_inode(inode, S_IFCHR|config.mode, device); inode->i_private = tty; - dentry = get_node(number); + dentry = get_node(root, number); if (!IS_ERR(dentry) && !dentry->d_inode) { d_instantiate(dentry, inode); - fsnotify_create(devpts_root->d_inode, dentry); + fsnotify_create(root->d_inode, dentry); } - mutex_unlock(&devpts_root->d_inode->i_mutex); + mutex_unlock(&root->d_inode->i_mutex); return 0; } struct tty_struct *devpts_get_tty(struct inode *inode, int number) { - struct dentry *dentry = get_node(number); + struct super_block *sb = pts_sb_from_inode(inode); + struct dentry *root = sb->s_root; + struct dentry *dentry = get_node(root, number); struct tty_struct *tty; tty = NULL; @@ -257,14 +267,16 @@ struct tty_struct *devpts_get_tty(struct dput(dentry); } - mutex_unlock(&devpts_root->d_inode->i_mutex); + mutex_unlock(&root->d_inode->i_mutex); return tty; } void devpts_pty_kill(struct inode *inode, int number) { - struct dentry *dentry = get_node(number); + struct super_block *sb = pts_sb_from_inode(inode); + struct dentry *root = sb->s_root; + struct dentry *dentry = get_node(root, number); if (!IS_ERR(dentry)) { struct inode *inode = dentry->d_inode; @@ -275,7 +287,7 @@ void devpts_pty_kill(struct inode *inode } dput(dentry); } - mutex_unlock(&devpts_root->d_inode->i_mutex); + mutex_unlock(&root->d_inode->i_mutex); } static int __init init_devpts_fs(void) ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 4/8]: Per-mount allocated_ptys [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (2 preceding siblings ...) 2008-08-21 2:27 ` [RFC][PATCH 3/8]: Remove devpts_root global sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:27 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:28 ` [RFC][PATCH 5/8]: Per-mount 'config' object sukadev-r/Jw6+rmf7HQT0dZR+AlfA ` (4 subsequent siblings) 8 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:27 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 4/8]: Per-mount allocated_ptys To enable multiple mounts of devpts, 'allocated_ptys' must be a per-mount variable rather than a global variable. Move 'allocated_ptys' into the super_block's s_fs_info. Changelog[v2]: Define and use DEVPTS_SB() wrapper. --- fs/devpts/inode.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 51 insertions(+), 7 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 13:16:47.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 13:19:17.000000000 -0700 @@ -29,7 +29,6 @@ #define DEVPTS_DEFAULT_MODE 0600 extern int pty_limit; /* Config limit on Unix98 ptys */ -static DEFINE_IDR(allocated_ptys); static DEFINE_MUTEX(allocated_ptys_lock); static struct vfsmount *devpts_mnt; @@ -54,6 +53,15 @@ static match_table_t tokens = { {Opt_err, NULL} }; +struct pts_fs_info { + struct idr allocated_ptys; +}; + +static inline struct pts_fs_info * DEVPTS_SB(struct super_block *sb) +{ + return sb->s_fs_info; +} + static inline struct super_block *pts_sb_from_inode(struct inode *inode) { if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC) @@ -125,6 +133,19 @@ static const struct super_operations dev .show_options = devpts_show_options, }; +static void *new_pts_fs_info(void) +{ + struct pts_fs_info *fsi; + + fsi = kmalloc(sizeof(struct pts_fs_info), GFP_KERNEL); + if (fsi) { + idr_init(&fsi->allocated_ptys); + } + printk(KERN_ERR "new_pts_fs_info(): Returning fsi %p\n", fsi); + return fsi; +} + + static int devpts_fill_super(struct super_block *s, void *data, int silent) { @@ -136,9 +157,13 @@ devpts_fill_super(struct super_block *s, s->s_op = &devpts_sops; s->s_time_gran = 1; + s->s_fs_info = new_pts_fs_info(); + if (!s->s_fs_info) + goto fail; + inode = new_inode(s); if (!inode) - goto fail; + goto free_fsi; inode->i_ino = 1; inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; inode->i_blocks = 0; @@ -154,6 +179,9 @@ devpts_fill_super(struct super_block *s, printk("devpts: get root dentry failed\n"); iput(inode); + +free_fsi: + kfree(s->s_fs_info); fail: return -ENOMEM; } @@ -164,11 +192,22 @@ static int devpts_get_sb(struct file_sys return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt); } + +static void devpts_kill_sb(struct super_block *sb) +{ + struct pts_fs_info *fsi = DEVPTS_SB(sb); // rcu ? + + //idr_destroy(&fsi->allocated_ptys); + kfree(fsi); + + kill_anon_super(sb); +} + static struct file_system_type devpts_fs_type = { .owner = THIS_MODULE, .name = "devpts", .get_sb = devpts_get_sb, - .kill_sb = kill_anon_super, + .kill_sb = devpts_kill_sb, }; /* @@ -187,14 +226,16 @@ int devpts_new_index(struct inode *inode { int index; int idr_ret; + struct super_block *sb = pts_sb_from_inode(inode); + struct pts_fs_info *fsi = DEVPTS_SB(sb); // need rcu ? retry: - if (!idr_pre_get(&allocated_ptys, GFP_KERNEL)) { + if (!idr_pre_get(&fsi->allocated_ptys, GFP_KERNEL)) { return -ENOMEM; } mutex_lock(&allocated_ptys_lock); - idr_ret = idr_get_new(&allocated_ptys, NULL, &index); + idr_ret = idr_get_new(&fsi->allocated_ptys, NULL, &index); if (idr_ret < 0) { mutex_unlock(&allocated_ptys_lock); if (idr_ret == -EAGAIN) @@ -203,7 +244,7 @@ retry: } if (index >= pty_limit) { - idr_remove(&allocated_ptys, index); + idr_remove(&fsi->allocated_ptys, index); mutex_unlock(&allocated_ptys_lock); return -EIO; } @@ -213,8 +254,11 @@ retry: void devpts_kill_index(struct inode *inode, int idx) { + struct super_block *sb = pts_sb_from_inode(inode); + struct pts_fs_info *fsi = DEVPTS_SB(sb); // need rcu ? + mutex_lock(&allocated_ptys_lock); - idr_remove(&allocated_ptys, idx); + idr_remove(&fsi->allocated_ptys, idx); mutex_unlock(&allocated_ptys_lock); } ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 5/8]: Per-mount 'config' object [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (3 preceding siblings ...) 2008-08-21 2:27 ` [RFC][PATCH 4/8]: Per-mount allocated_ptys sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:28 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:28 ` [RFC][PATCH 6/8]: Extract option parsing to new function sukadev-r/Jw6+rmf7HQT0dZR+AlfA ` (3 subsequent siblings) 8 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:28 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 5/8]: Per-mount 'config' object With support for multiple mounts of devpts, the 'config' structure really represents per-mount options rather than config parameters. Rename 'config' structure to 'pts_mount_opts' and store it in the super-block. --- fs/devpts/inode.c | 51 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 30 insertions(+), 21 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 13:19:17.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 14:26:22.000000000 -0700 @@ -33,13 +33,13 @@ static DEFINE_MUTEX(allocated_ptys_lock) static struct vfsmount *devpts_mnt; -static struct { +struct pts_mount_opts { int setuid; int setgid; uid_t uid; gid_t gid; umode_t mode; -} config = {.mode = DEVPTS_DEFAULT_MODE}; +}; enum { Opt_uid, Opt_gid, Opt_mode, @@ -55,6 +55,7 @@ static match_table_t tokens = { struct pts_fs_info { struct idr allocated_ptys; + struct pts_mount_opts mount_opts; }; static inline struct pts_fs_info * DEVPTS_SB(struct super_block *sb) @@ -73,12 +74,14 @@ static inline struct super_block *pts_sb static int devpts_remount(struct super_block *sb, int *flags, char *data) { char *p; + struct pts_fs_info *fsi = DEVPTS_SB(sb); + struct pts_mount_opts *opts = &fsi->mount_opts; - config.setuid = 0; - config.setgid = 0; - config.uid = 0; - config.gid = 0; - config.mode = DEVPTS_DEFAULT_MODE; + opts->setuid = 0; + opts->setgid = 0; + opts->uid = 0; + opts->gid = 0; + opts->mode = DEVPTS_DEFAULT_MODE; while ((p = strsep(&data, ",")) != NULL) { substring_t args[MAX_OPT_ARGS]; @@ -93,19 +96,19 @@ static int devpts_remount(struct super_b case Opt_uid: if (match_int(&args[0], &option)) return -EINVAL; - config.uid = option; - config.setuid = 1; + opts->uid = option; + opts->setuid = 1; break; case Opt_gid: if (match_int(&args[0], &option)) return -EINVAL; - config.gid = option; - config.setgid = 1; + opts->gid = option; + opts->setgid = 1; break; case Opt_mode: if (match_octal(&args[0], &option)) return -EINVAL; - config.mode = option & S_IALLUGO; + opts->mode = option & S_IALLUGO; break; default: printk(KERN_ERR "devpts: called with bogus options\n"); @@ -118,11 +121,14 @@ static int devpts_remount(struct super_b static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs) { - if (config.setuid) - seq_printf(seq, ",uid=%u", config.uid); - if (config.setgid) - seq_printf(seq, ",gid=%u", config.gid); - seq_printf(seq, ",mode=%03o", config.mode); + struct pts_fs_info *fsi = DEVPTS_SB(vfs->mnt_sb); + struct pts_mount_opts *opts = &fsi->mount_opts; + + if (opts->setuid) + seq_printf(seq, ",uid=%u", opts->uid); + if (opts->setgid) + seq_printf(seq, ",gid=%u", opts->gid); + seq_printf(seq, ",mode=%03o", opts->mode); return 0; } @@ -137,10 +143,11 @@ static void *new_pts_fs_info(void) { struct pts_fs_info *fsi; - fsi = kmalloc(sizeof(struct pts_fs_info), GFP_KERNEL); + fsi = kzalloc(sizeof(struct pts_fs_info), GFP_KERNEL); if (fsi) { idr_init(&fsi->allocated_ptys); } + fsi->mount_opts.mode = DEVPTS_DEFAULT_MODE; printk(KERN_ERR "new_pts_fs_info(): Returning fsi %p\n", fsi); return fsi; } @@ -271,6 +278,8 @@ int devpts_pty_new(struct inode *ptmx_in struct super_block *sb = pts_sb_from_inode(ptmx_inode); struct inode *inode = new_inode(sb); struct dentry *root = sb->s_root; + struct pts_fs_info *fsi = DEVPTS_SB(sb); + struct pts_mount_opts *opts = &fsi->mount_opts; /* We're supposed to be given the slave end of a pty */ BUG_ON(driver->type != TTY_DRIVER_TYPE_PTY); @@ -280,10 +289,10 @@ int devpts_pty_new(struct inode *ptmx_in return -ENOMEM; inode->i_ino = number+2; - inode->i_uid = config.setuid ? config.uid : current->fsuid; - inode->i_gid = config.setgid ? config.gid : current->fsgid; + inode->i_uid = opts->setuid ? opts->uid : current->fsuid; + inode->i_gid = opts->setgid ? opts->gid : current->fsgid; inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; - init_special_inode(inode, S_IFCHR|config.mode, device); + init_special_inode(inode, S_IFCHR|opts->mode, device); inode->i_private = tty; dentry = get_node(root, number); ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 6/8]: Extract option parsing to new function [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (4 preceding siblings ...) 2008-08-21 2:28 ` [RFC][PATCH 5/8]: Per-mount 'config' object sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:28 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:29 ` [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA ` (2 subsequent siblings) 8 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:28 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 6/8]: Extract option parsing to new function Move code to parse mount options into a separate function so it can (later) be shared between mount and remount operations. --- fs/devpts/inode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 14:26:22.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 14:28:39.000000000 -0700 @@ -71,11 +71,9 @@ static inline struct super_block *pts_sb return devpts_mnt->mnt_sb; } -static int devpts_remount(struct super_block *sb, int *flags, char *data) +static int parse_mount_options(char *data, struct pts_mount_opts *opts) { char *p; - struct pts_fs_info *fsi = DEVPTS_SB(sb); - struct pts_mount_opts *opts = &fsi->mount_opts; opts->setuid = 0; opts->setgid = 0; @@ -119,6 +117,14 @@ static int devpts_remount(struct super_b return 0; } +static int devpts_remount(struct super_block *sb, int *flags, char *data) +{ + struct pts_fs_info *fsi = DEVPTS_SB(sb); + struct pts_mount_opts *opts = &fsi->mount_opts; + + return parse_mount_options(data, opts); +} + static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs) { struct pts_fs_info *fsi = DEVPTS_SB(vfs->mnt_sb); ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (5 preceding siblings ...) 2008-08-21 2:28 ` [RFC][PATCH 6/8]: Extract option parsing to new function sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:29 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [not found] ` <20080821022908.GG29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 2:29 ` [RFC][PATCH 8/8]: Enable multiple mounts of devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:45 ` [RFC][PATCH 0/8][v2]: " H. Peter Anvin 8 siblings, 1 reply; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:29 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts /dev/ptmx is closely tied to the devpts filesystem. An open of /dev/ptmx, allocates the next pty index and the associated device shows up in the devpts fs as /dev/pts/n. Wih multiple mounts of devpts filesystem, an open of /dev/ptmx would be unable to determine which instance of the devpts is being accessed. One solution for this would be to create make /dev/ptmx a symlink to /dev/pts/ptmx and create the device node, ptmx, in each instance of devpts. When /dev/ptmx is opened, we can use the inode of /dev/pts/ptmx to identify the specific devpts instance. This patch has the kernel internally create the [ptmx, c, 5:2] device when mounting devpts filesystem. The permissions for the device node can be specified by the '-o ptmx_mode=0666' option. The default mode is 0666. USER-SPACE IMPACT: This patch has an user-space impact in that the permissions for the device node are specified at mount time. Default mode of 0666 matches current defaults on Ubuntu, Redhat, Hopefully, presence of the 'ptmx' node in /dev/pts does not surprise user space. Changelog[v2]: - [H. Peter Anvin] Remove mknod() system call support and create the ptmx node internally. Changelog[v1]: - Earlier version of this patch enabled creating /dev/pts/tty as well. As pointed out by Al Viro and H. Peter Anvin, that is not really necessary. --- fs/devpts/inode.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 77 insertions(+), 3 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 14:28:39.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 17:44:29.000000000 -0700 @@ -27,6 +27,7 @@ #define DEVPTS_SUPER_MAGIC 0x1cd1 #define DEVPTS_DEFAULT_MODE 0600 +#define DEVPTS_DEFAULT_PTMX_MODE 0666 extern int pty_limit; /* Config limit on Unix98 ptys */ static DEFINE_MUTEX(allocated_ptys_lock); @@ -39,10 +40,11 @@ struct pts_mount_opts { uid_t uid; gid_t gid; umode_t mode; + umode_t ptmx_mode; }; enum { - Opt_uid, Opt_gid, Opt_mode, + Opt_uid, Opt_gid, Opt_mode, Opt_ptmx_mode, Opt_err }; @@ -50,6 +52,7 @@ static match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_mode, "mode=%o"}, + {Opt_ptmx_mode, "ptmx_mode=%o"}, {Opt_err, NULL} }; @@ -80,6 +83,7 @@ static int parse_mount_options(char *dat opts->uid = 0; opts->gid = 0; opts->mode = DEVPTS_DEFAULT_MODE; + opts->ptmx_mode = DEVPTS_DEFAULT_PTMX_MODE; while ((p = strsep(&data, ",")) != NULL) { substring_t args[MAX_OPT_ARGS]; @@ -108,6 +112,11 @@ static int parse_mount_options(char *dat return -EINVAL; opts->mode = option & S_IALLUGO; break; + case Opt_ptmx_mode: + if (match_octal(&args[0], &option)) + return -EINVAL; + opts->ptmx_mode = option & S_IALLUGO; + break; default: printk(KERN_ERR "devpts: called with bogus options\n"); return -EINVAL; @@ -135,6 +144,7 @@ static int devpts_show_options(struct se if (opts->setgid) seq_printf(seq, ",gid=%u", opts->gid); seq_printf(seq, ",mode=%03o", opts->mode); + seq_printf(seq, ",ptmx_mode=%03o", opts->ptmx_mode); return 0; } @@ -199,10 +209,74 @@ fail: return -ENOMEM; } +int mknod_ptmx(struct super_block *sb) +{ + struct dentry *root; + struct dentry *dentry; + struct inode *inode; + struct pts_fs_info *fsi = DEVPTS_SB(sb); + struct pts_mount_opts *opts = &fsi->mount_opts; + int mode; + + root = sb->s_root; + dentry = lookup_one_len("ptmx", root, 4); + if (IS_ERR(dentry)) { + printk(KERN_ERR "Unable to alloc dentry for ptmx node\n"); + return -ENOMEM; + } + + if (dentry->d_inode) { + printk(KERN_ERR "'ptmx' (ino %lu) exists in this devpts\n", + dentry->d_inode->i_ino); + return 0; + } + + /* + * Create a new 'ptmx' node in this mount of devpts. + */ + inode = new_inode(sb); + if (!inode) { + printk(KERN_ERR "Unable to alloc inode for ptmx node\n"); + dput(dentry); + return -ENOMEM; + } + + inode->i_uid = inode->i_gid = 0; + inode->i_blocks = 0; + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; + + mode = S_IFCHR|opts->ptmx_mode; + init_special_inode(inode, mode, MKDEV(TTYAUX_MAJOR, 2)); + + d_add(dentry, inode); // d_instantiate() should be enough ? + + printk(KERN_DEBUG "Created ptmx node in devpts ino %lu\n", + inode->i_ino); + + return 0; +} + static int devpts_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *data, struct vfsmount *mnt) { - return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt); + int err; + + err = get_sb_single(fs_type, flags, data, devpts_fill_super, mnt); + if (err) + return err; + + /* + * Create the 'ptmx' device in the root of the new mount. + * Do this even for the 'single-mount' case so users can choose + * to leave /dev/ptmx -> pts/ptmx symlink as they switch between + * old and new kernels. + */ + err = mknod_ptmx(mnt->mnt_sb); + if (err) { + dput(mnt->mnt_sb->s_root); + deactivate_super(mnt->mnt_sb); + } + return err; } @@ -213,7 +287,7 @@ static void devpts_kill_sb(struct super_ //idr_destroy(&fsi->allocated_ptys); kfree(fsi); - kill_anon_super(sb); + kill_litter_super(sb); } static struct file_system_type devpts_fs_type = { ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821022908.GG29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821022908.GG29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 9:21 ` Alan Cox [not found] ` <20080821102139.43c44f67-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Alan Cox @ 2008-08-21 9:21 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, hpa-YMNOUZJC4hwAvxtiuMwx3w, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w > This patch has the kernel internally create the [ptmx, c, 5:2] device > when mounting devpts filesystem. The permissions for the device node > can be specified by the '-o ptmx_mode=0666' option. The default mode > is 0666. NAK > Hopefully, presence of the 'ptmx' node in /dev/pts does not surprise > user space. If you are going to make major changes requiring user space changes to use them then you can change the user space rather than playing "gee where did that come from" with the existing system. Alan ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821102139.43c44f67-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821102139.43c44f67-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org> @ 2008-08-21 16:09 ` H. Peter Anvin [not found] ` <48AD932F.8090908-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 16:09 UTC (permalink / raw) To: Alan Cox Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w Alan Cox wrote: >> This patch has the kernel internally create the [ptmx, c, 5:2] device >> when mounting devpts filesystem. The permissions for the device node >> can be specified by the '-o ptmx_mode=0666' option. The default mode >> is 0666. > > NAK > >> Hopefully, presence of the 'ptmx' node in /dev/pts does not surprise >> user space. > > If you are going to make major changes requiring user space changes to > use them then you can change the user space rather than playing "gee > where did that come from" with the existing system. > This particular one I think is the right thing, despite everything. In particular, it makes a *hell* of a lot more sense to have this auto-created, than supporting mknod(2) inside the devpts filesystem. It's not a matter of "changing the user space"; it's a matter of what makes most sense inside the kernel. His new implementation doesn't require user space changes unless you want to take advantages of the multi-instance features. It thus allows for a soft transition. However, soft or hard transition is irrelevant -- creating ptmx inside devpts is necessary, and it doesn't make sense to require userspace to mknod it, since the kernel code ends up being more complex. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48AD932F.8090908-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <48AD932F.8090908-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 16:27 ` Alan Cox [not found] ` <20080821172700.781b0011-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Alan Cox @ 2008-08-21 16:27 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w > auto-created, than supporting mknod(2) inside the devpts filesystem. > It's not a matter of "changing the user space"; it's a matter of what > makes most sense inside the kernel. Having an extra node with different permissions suddenely appear without warning isn't I think good behaviour. I'm open to being convinced and the other problems with that code are more pressing. ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821172700.781b0011-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821172700.781b0011-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org> @ 2008-08-21 16:49 ` H. Peter Anvin [not found] ` <48AD9C93.6080302-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 16:49 UTC (permalink / raw) To: Alan Cox Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w Alan Cox wrote: >> auto-created, than supporting mknod(2) inside the devpts filesystem. >> It's not a matter of "changing the user space"; it's a matter of what >> makes most sense inside the kernel. > > Having an extra node with different permissions suddenely appear without > warning isn't I think good behaviour. Hm. Given that the single-instance mode is the backwards compatibility mode (and it's accessible from outside the filesystem), it probably makes sense to suppress creating this device node when *not* applying the "newns" option, or whatever we want to call it. > I'm open to being convinced and the > other problems with that code are more pressing. Agreed. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48AD9C93.6080302-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <48AD9C93.6080302-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 17:22 ` Serge E. Hallyn [not found] ` <20080821172245.GA28411-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 17:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 1 sibling, 1 reply; 51+ messages in thread From: Serge E. Hallyn @ 2008-08-21 17:22 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w Quoting H. Peter Anvin (hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org): > Alan Cox wrote: >>> auto-created, than supporting mknod(2) inside the devpts filesystem. >>> It's not a matter of "changing the user space"; it's a matter of what >>> makes most sense inside the kernel. >> >> Having an extra node with different permissions suddenely appear without >> warning isn't I think good behaviour. > > Hm. Given that the single-instance mode is the backwards compatibility > mode (and it's accessible from outside the filesystem), it probably > makes sense to suppress creating this device node when *not* applying > the "newns" option, or whatever we want to call it. That makes sense. But if Suka does that, then is creating the device when the newns flag is specified ok with you, Alan? >> I'm open to being convinced and the >> other problems with that code are more pressing. > > Agreed. > > -hpa thanks, -serge ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821172245.GA28411-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821172245.GA28411-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 17:07 ` Alan Cox 0 siblings, 0 replies; 51+ messages in thread From: Alan Cox @ 2008-08-21 17:07 UTC (permalink / raw) To: Serge E. Hallyn Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w > That makes sense. But if Suka does that, then is creating the device > when the newns flag is specified ok with you, Alan? Sounds a good basis yes. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <48AD9C93.6080302-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 2008-08-21 17:22 ` Serge E. Hallyn @ 2008-08-21 17:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [not found] ` <20080821172342.GA8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 17:23 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w H. Peter Anvin [hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org] wrote: > Alan Cox wrote: >>> auto-created, than supporting mknod(2) inside the devpts filesystem. It's >>> not a matter of "changing the user space"; it's a matter of what makes >>> most sense inside the kernel. >> Having an extra node with different permissions suddenely appear without >> warning isn't I think good behaviour. > > Hm. Given that the single-instance mode is the backwards compatibility > mode (and it's accessible from outside the filesystem), it probably makes > sense to suppress creating this device node when *not* applying the "newns" > option, or whatever we want to call it. I had the new ptmx node only in 'multi-mount' mode initially. But if users want the multi-mount semantics, /dev/ptmx must be a symlink. If its a symlink, we break in the single-mount case (which does not have the ptmx node and we don't support mknod in pts). > >> I'm open to being convinced and the >> other problems with that code are more pressing. Yes, I will look at the latest in linux-next and the ->driver_data approach. But just to confirm, we do want try and keep single-mount semantics. > > Agreed. > > -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821172342.GA8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821172342.GA8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 17:38 ` Eric W. Biederman [not found] ` <m18wuqtgj7.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 2008-08-21 17:40 ` H. Peter Anvin 1 sibling, 1 reply; 51+ messages in thread From: Eric W. Biederman @ 2008-08-21 17:38 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, Alan Cox, xemul-GEFAQzZX7r8dnm+yROfE0A sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes: > H. Peter Anvin [hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org] wrote: >> Alan Cox wrote: >>>> auto-created, than supporting mknod(2) inside the devpts filesystem. It's >>>> not a matter of "changing the user space"; it's a matter of what makes >>>> most sense inside the kernel. >>> Having an extra node with different permissions suddenely appear without >>> warning isn't I think good behaviour. >> >> Hm. Given that the single-instance mode is the backwards compatibility >> mode (and it's accessible from outside the filesystem), it probably makes >> sense to suppress creating this device node when *not* applying the "newns" >> option, or whatever we want to call it. > > I had the new ptmx node only in 'multi-mount' mode initially. But if users > want the multi-mount semantics, /dev/ptmx must be a symlink. If its a symlink, > we break in the single-mount case (which does not have the ptmx node and > we don't support mknod in pts). Then have user space make it a file bind mount instead of symlink. That should address all of the backwards compatibility concerns, and allow us to only create it when open. >>> I'm open to being convinced and the >>> other problems with that code are more pressing. > > Yes, I will look at the latest in linux-next and the ->driver_data > approach. > > But just to confirm, we do want try and keep single-mount semantics. Definitely. Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <m18wuqtgj7.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <m18wuqtgj7.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> @ 2008-08-21 17:50 ` H. Peter Anvin [not found] ` <48ADAAE2.6040700-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 17:50 UTC (permalink / raw) To: Eric W. Biederman Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, Alan Cox, xemul-GEFAQzZX7r8dnm+yROfE0A Eric W. Biederman wrote: >> I had the new ptmx node only in 'multi-mount' mode initially. But if users >> want the multi-mount semantics, /dev/ptmx must be a symlink. If its a symlink, >> we break in the single-mount case (which does not have the ptmx node and >> we don't support mknod in pts). > > Then have user space make it a file bind mount instead of symlink. > That should address all of the backwards compatibility concerns, and > allow us to only create it when open. The right thing is that, if you want to support back-and-forth flipping, to introduce a udev rule which looks for pts/ptmx, links to it if present, and otherwise creates the ptmx device node. This is *only* required to support back-and-forth, and can be introduced at any time after this patch is in the kernel -- or even before. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48ADAAE2.6040700-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <48ADAAE2.6040700-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 18:23 ` Eric W. Biederman [not found] ` <m1hc9eqlbo.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Eric W. Biederman @ 2008-08-21 18:23 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, Alan Cox, xemul-GEFAQzZX7r8dnm+yROfE0A "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> writes: > Eric W. Biederman wrote: >>> I had the new ptmx node only in 'multi-mount' mode initially. But if users >>> want the multi-mount semantics, /dev/ptmx must be a symlink. If its a > symlink, >>> we break in the single-mount case (which does not have the ptmx node and >>> we don't support mknod in pts). >> >> Then have user space make it a file bind mount instead of symlink. >> That should address all of the backwards compatibility concerns, and >> allow us to only create it when open. > > The right thing is that, if you want to support back-and-forth flipping, to > introduce a udev rule which looks for pts/ptmx, links to it if present, and > otherwise creates the ptmx device node. The point of making it a bind is to address the concerns about backwards compatibility in user space. In particular security conscious applications and applications that perform sanity checks are known to ignore things if they are the wrong type in the filesystem. > This is *only* required to support back-and-forth, and can be introduced at any > time after this patch is in the kernel -- or even before. You can use a file bind mount just as easily as a symlink. As for udev I haven't seen a version that is accessible to mere mortals yet and it doesn't seem like they plan on it being so. Eventually I will get around to making sense of it as we need to make it work in a container but so far it seems to be much more complex then it should be. Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <m1hc9eqlbo.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <m1hc9eqlbo.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> @ 2008-08-21 18:36 ` H. Peter Anvin 0 siblings, 0 replies; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 18:36 UTC (permalink / raw) To: Eric W. Biederman Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, Alan Cox, xemul-GEFAQzZX7r8dnm+yROfE0A Eric W. Biederman wrote: > > The point of making it a bind is to address the concerns about > backwards compatibility in user space. In particular security > conscious applications and applications that perform sanity checks > are known to ignore things if they are the wrong type in the filesystem. > A.k.a. broken applications... >> This is *only* required to support back-and-forth, and can be introduced at any >> time after this patch is in the kernel -- or even before. > > You can use a file bind mount just as easily as a symlink. > > As for udev I haven't seen a version that is accessible to mere mortals yet > and it doesn't seem like they plan on it being so. Eventually I will get > around to making sense of it as we need to make it work in a container > but so far it seems to be much more complex then it should be. I have not had that experience... I find it relatively simple to deal with. The biggest problem is the fact that the rules aren't bundled with the kernel, which causes nasty chicken and egg problems. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821172342.GA8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 17:38 ` Eric W. Biederman @ 2008-08-21 17:40 ` H. Peter Anvin [not found] ` <48ADA890.4060309-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 17:40 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: > > I had the new ptmx node only in 'multi-mount' mode initially. But if users > want the multi-mount semantics, /dev/ptmx must be a symlink. If its a symlink, > we break in the single-mount case (which does not have the ptmx node and > we don't support mknod in pts). > True, but changing that is still a configuration change (adding newns to the fstab); it's not that much more work to change whatever else needs to change. I personally don't expect a whole lot of back-and-forth; I suspect people will switch from the legacy model to the newns model mostly as part of a distro upgrade. >>> I'm open to being convinced and the >>> other problems with that code are more pressing. > > Yes, I will look at the latest in linux-next and the ->driver_data > approach. > > But just to confirm, we do want try and keep single-mount semantics. Certainly for several years at least. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48ADA890.4060309-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <48ADA890.4060309-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 18:11 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [not found] ` <20080821181133.GB8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 18:11 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w H. Peter Anvin [hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org] wrote: > sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: >> I had the new ptmx node only in 'multi-mount' mode initially. But if users >> want the multi-mount semantics, /dev/ptmx must be a symlink. If its a >> symlink, >> we break in the single-mount case (which does not have the ptmx node and >> we don't support mknod in pts). > > True, but changing that is still a configuration change (adding newns to > the fstab); it's not that much more work to change whatever else needs to > change. Hmm, so, single and multi-mount don't coexist ? i.e some are multi-mounts while others are single-mounts. The way I looked at is that even if a distro has not yet updated the startup script (fstab), we could use the multi-mount. Maybe a container startup script could change /dev/ptmx to symlink and both types of mounts can work simultaneously. Would that be unnecessary ? > > I personally don't expect a whole lot of back-and-forth; I suspect people > will switch from the legacy model to the newns model mostly as part of a > distro upgrade. > >>>> I'm open to being convinced and the >>>> other problems with that code are more pressing. >> Yes, I will look at the latest in linux-next and the ->driver_data >> approach. >> But just to confirm, we do want try and keep single-mount semantics. > > Certainly for several years at least. Ok. > > -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821181133.GB8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821181133.GB8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 18:17 ` H. Peter Anvin 2008-08-21 21:00 ` Serge E. Hallyn 1 sibling, 0 replies; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 18:17 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: > > Hmm, so, single and multi-mount don't coexist ? i.e some are multi-mounts > while others are single-mounts. > > The way I looked at is that even if a distro has not yet updated the > startup script (fstab), we could use the multi-mount. Maybe a container > startup script could change /dev/ptmx to symlink and both types of > mounts can work simultaneously. > > Would that be unnecessary ? > Yes, that's unncessary; you can still use a "newinstance" elsewhere in the system (in a container) where you rather by definition don't have a backward compatibility issue. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821181133.GB8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-08-21 18:17 ` H. Peter Anvin @ 2008-08-21 21:00 ` Serge E. Hallyn [not found] ` <20080821210040.GA14532-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 51+ messages in thread From: Serge E. Hallyn @ 2008-08-21 21:00 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w Quoting sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org (sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): > H. Peter Anvin [hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org] wrote: > > sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: > >> I had the new ptmx node only in 'multi-mount' mode initially. But if users > >> want the multi-mount semantics, /dev/ptmx must be a symlink. If its a > >> symlink, > >> we break in the single-mount case (which does not have the ptmx node and > >> we don't support mknod in pts). > > > > True, but changing that is still a configuration change (adding newns to > > the fstab); it's not that much more work to change whatever else needs to > > change. > > Hmm, so, single and multi-mount don't coexist ? i.e some are multi-mounts > while others are single-mounts. > > The way I looked at is that even if a distro has not yet updated the > startup script (fstab), we could use the multi-mount. Maybe a container > startup script could change /dev/ptmx to symlink and both types of > mounts can work simultaneously. Suka, I think you are missing Eric's point. You're imagining there is a problem because you can't have a file inthe same /dev be both a device node and a symlink. Eric is pointing out that you don't need a symlink, because the device node can be a target for a bind mount. So you can do mknod /dev/ptmx c 5 2 clone(CLONE_NEWNS) mount --bind /dev/pts/ptmx /dev/ptmx with no problems in either namespace. > Would that be unnecessary ? > > > > > I personally don't expect a whole lot of back-and-forth; I suspect people > > will switch from the legacy model to the newns model mostly as part of a > > distro upgrade. > > > >>>> I'm open to being convinced and the > >>>> other problems with that code are more pressing. > >> Yes, I will look at the latest in linux-next and the ->driver_data > >> approach. > >> But just to confirm, we do want try and keep single-mount semantics. > > > > Certainly for several years at least. > > Ok. > > > > > -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821210040.GA14532-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts [not found] ` <20080821210040.GA14532-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 22:16 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 0 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 22:16 UTC (permalink / raw) To: Serge E. Hallyn Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, Alan Cox, ebiederm-aS9lmoZGLiVWk0Htik3J/w Serge E. Hallyn [serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org] wrote: | Suka, I think you are missing Eric's point. Yes, sorry I missed it. The bind mount should work. Thanks. ^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC][PATCH 8/8]: Enable multiple mounts of devpts [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (6 preceding siblings ...) 2008-08-21 2:29 ` [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:29 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2008-08-21 2:45 ` [RFC][PATCH 0/8][v2]: " H. Peter Anvin 8 siblings, 0 replies; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:29 UTC (permalink / raw) To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][PATCH 8/8]: Enable multiple mounts of devpts To support containers, allow multiple instances of devpts filesystem. But to preserve backward compatibility, provide this support for multiple-mounts under the new mount option, '-o newmnt'. IOW, devpts must support both single-mount and multiple-mount semantics. If the filesystem is mounted without the 'newmnt' option (as in current start-up scripts) the new mount simply binds to the initial kernel mount of devpts and thus current behavior is preserved. If the 'newmnt' option is specified (by new container-startup scripts) a new instance of the devpts fs is created and any ptys created in this instance are independent of the ptys in other mounts of devpts. (Hmm would 'private-mount' be a better name as in MAP_PRIVATE) ? Eg: A container startup script could do the following: $ ns_exec -cm /bin/bash $ umount /dev/pts $ mount -t devpts -o newmnt lxcpts /dev/pts $ sshd -p 6710 where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs /bin/bash in the child process. A pty created by the sshd is not visible in the original mount of /dev/pts. USER-SPACE-IMPACT: The -onewmnt option is meant to minimize userspace impact. Following are known impacts. 1. /dev/ptmx symlink to pts/ptmx. This is optional if only single- mount semantics is desired but is required if multi-mount semantics. 2. /dev/pts fs has a new entry (ptmx device node) that is created/ destroyed automatically. TODO: Others impacts ? Implementation note: See comments in new get_sb_ref() function in fs/super.c (yes fs/super.c !) on why get_sb_single() cannot be directly used. Changelog[v2]: Support both single-mount and multiple-mount semantics and provide '-onewmnt' option to select the semantics. --- fs/devpts/inode.c | 43 +++++++++++++++++++++++++++++++++++++++++-- fs/super.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 3 files changed, 87 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc8-mm1/fs/devpts/inode.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/devpts/inode.c 2008-08-20 17:44:29.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/devpts/inode.c 2008-08-20 17:50:42.000000000 -0700 @@ -41,10 +41,11 @@ struct pts_mount_opts { gid_t gid; umode_t mode; umode_t ptmx_mode; + int newmnt; }; enum { - Opt_uid, Opt_gid, Opt_mode, Opt_ptmx_mode, + Opt_uid, Opt_gid, Opt_mode, Opt_ptmx_mode, Opt_newmnt, Opt_err }; @@ -53,6 +54,7 @@ static match_table_t tokens = { {Opt_gid, "gid=%u"}, {Opt_mode, "mode=%o"}, {Opt_ptmx_mode, "ptmx_mode=%o"}, + { Opt_newmnt, "newmnt" }, {Opt_err, NULL} }; @@ -84,6 +86,7 @@ static int parse_mount_options(char *dat opts->gid = 0; opts->mode = DEVPTS_DEFAULT_MODE; opts->ptmx_mode = DEVPTS_DEFAULT_PTMX_MODE; + opts->newmnt = 0; while ((p = strsep(&data, ",")) != NULL) { substring_t args[MAX_OPT_ARGS]; @@ -117,6 +120,9 @@ static int parse_mount_options(char *dat return -EINVAL; opts->ptmx_mode = option & S_IALLUGO; break; + case Opt_newmnt: + opts->newmnt = 1; + break; default: printk(KERN_ERR "devpts: called with bogus options\n"); return -EINVAL; @@ -145,6 +151,8 @@ static int devpts_show_options(struct se seq_printf(seq, ",gid=%u", opts->gid); seq_printf(seq, ",mode=%03o", opts->mode); seq_printf(seq, ",ptmx_mode=%03o", opts->ptmx_mode); + if (opts->newmnt) + seq_printf(seq, ",newmnt"); return 0; } @@ -256,12 +264,43 @@ int mknod_ptmx(struct super_block *sb) return 0; } +static int mount_init_pts(struct file_system_type *fs_type, int flags, + void *data, struct vfsmount *mnt) +{ + int err; + + if (!devpts_mnt) { + err = get_sb_single(fs_type, flags, data, devpts_fill_super, + mnt); + if (!err) + devpts_mnt = mnt; + return err; + } + + err = get_sb_ref(devpts_mnt->mnt_sb, flags, data, mnt); + + printk(KERN_ERR "mount_init_pts(): returning %d\n", err); + return err; +} + static int devpts_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *data, struct vfsmount *mnt) { int err; + struct pts_mount_opts opts; + + if (parse_mount_options((char *)data, &opts)) + return -EINVAL; + + printk(KERN_ERR "devpts_get_sb(): newmnt option is %d\n", opts.newmnt); + + if (opts.newmnt) { + err = get_sb_nodev(fs_type, flags, data, devpts_fill_super, + mnt); + } else { + err = mount_init_pts(fs_type, flags, data, mnt); + } - err = get_sb_single(fs_type, flags, data, devpts_fill_super, mnt); if (err) return err; Index: linux-2.6.26-rc8-mm1/fs/super.c =================================================================== --- linux-2.6.26-rc8-mm1.orig/fs/super.c 2008-08-20 17:44:29.000000000 -0700 +++ linux-2.6.26-rc8-mm1/fs/super.c 2008-08-20 18:07:38.000000000 -0700 @@ -883,6 +883,50 @@ int get_sb_single(struct file_system_typ EXPORT_SYMBOL(get_sb_single); +int get_sb_ref(struct super_block *sb, int flags, void *data, + struct vfsmount *mnt) +{ + int err; + + /* + * UGLY: + * + * This is needed to support multiple mounts in devpts while + * preserving backward compatibility of the current 'single-mount' + * semantics. + * + * devpts cannot simply use get_sb_single(), bc get_sb_single() or + * more specifically, sget() finds the most recent mount of devpts. + * But that recent mount may not the be initial kernel mount (user + * may mounted with the '-onewmnt' option since the initial mount + * and get_sb_single() would pick that super-block). + * + * Caller is responsible to ensure that 'sb' is valid initialized. + * So armed with that fact, unroll essentials of get_sb_single() + * here. + */ + spin_lock(&sb_lock); + + if (!grab_super(sb)) { + /* + * TODO: anymore cleanup ? + */ + return -EAGAIN; + } + + err = do_remount_sb(sb, flags, data, 0); + if (err) { + /* + * (don't deactivate_super() here - its from initial pts mount) + * + * TODO: anymore cleanup ? + */ + up_write(&sb->s_umount); + return err; + } + return simple_set_mnt(mnt, sb); +} + struct vfsmount * vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data) { Index: linux-2.6.26-rc8-mm1/include/linux/fs.h =================================================================== --- linux-2.6.26-rc8-mm1.orig/include/linux/fs.h 2008-08-20 17:46:27.000000000 -0700 +++ linux-2.6.26-rc8-mm1/include/linux/fs.h 2008-08-20 17:47:04.000000000 -0700 @@ -1522,6 +1522,8 @@ extern int get_sb_nodev(struct file_syst int flags, void *data, int (*fill_super)(struct super_block *, void *, int), struct vfsmount *mnt); +extern int get_sb_ref(struct super_block *sb, int flags, void *data, + struct vfsmount *mnt); void generic_shutdown_super(struct super_block *sb); void kill_block_super(struct super_block *sb); void kill_anon_super(struct super_block *sb); ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (7 preceding siblings ...) 2008-08-21 2:29 ` [RFC][PATCH 8/8]: Enable multiple mounts of devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 2:45 ` H. Peter Anvin [not found] ` <48ACD6CB.5030706-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 8 siblings, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 2:45 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, ebiederm-aS9lmoZGLiVWk0Htik3J/w sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: > > TODO: > - Remove even initial kernel mount of devpts ? (If we do, how > do we preserve single-mount semantics) ? Doesn't make sense unless we decide to drop single-mount semantics in the (far) future. As long as we have an instance that services unconnected ptmx instances, it makes sense to have that instance available to the kernel at all times. I don't like the name "newmnt" for the option; it is not just another mount, but a whole new instance of the pty space. I observe you didn't incorporate my feedback with regards to get_node(). In this scheme, any and all uses of get_node() are bogus; as such, you're missing the huge opportunity for cleanup that comes along with this whole thing. This means breaking compatibility in one very minor way, which is if people copy device nodes out of /dev/pts, but I am feeling pretty sure that that is much better than carrying the ugliness that goes along with the current code. Furthermore, if there are anyone who do something that silly, they need to fix it anyway. The *entire* implementation of devpts_get_tty(), for example, should look like: struct tty_struct *devpts_get_tty(struct inode *inode) { struct super_block *sb = inode->i_sb; if (sb->s_magic == DEVPTS_SUPER_MAGIC) return (struct tty_struct *)inode->i_private; else return NULL; /* Higher layer should return -ENXIO */ } I really appreciate your tackling this implementation. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48ACD6CB.5030706-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48ACD6CB.5030706-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 3:10 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [not found] ` <20080821031028.GB30205-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-08-21 3:10 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, ebiederm-aS9lmoZGLiVWk0Htik3J/w H. Peter Anvin [hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org] wrote: > sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: >> TODO: >> - Remove even initial kernel mount of devpts ? (If we do, how >> do we preserve single-mount semantics) ? > > Doesn't make sense unless we decide to drop single-mount semantics in the > (far) future. As long as we have an instance that services unconnected ptmx > instances, it makes sense to have that instance available to the kernel at > all times. > > I don't like the name "newmnt" for the option; it is not just another > mount, but a whole new instance of the pty space. I agree. Its mostly a place-holder for now. How about newns or newptsns ? > > I observe you didn't incorporate my feedback with regards to get_node(). Yes, I have not addressed it yet. Will look into it in the next pass, but will add to the todo list now. > In this scheme, any and all uses of get_node() are bogus; as such, you're > missing the huge opportunity for cleanup that comes along with this whole > thing. > > This means breaking compatibility in one very minor way, which is if people > copy device nodes out of /dev/pts, but I am feeling pretty sure that that > is much better than carrying the ugliness that goes along with the current > code. Furthermore, if there are anyone who do something that silly, they > need to fix it anyway. > > The *entire* implementation of devpts_get_tty(), for example, should look > like: > > struct tty_struct *devpts_get_tty(struct inode *inode) > { > struct super_block *sb = inode->i_sb; > > if (sb->s_magic == DEVPTS_SUPER_MAGIC) > return (struct tty_struct *)inode->i_private; > else > return NULL; /* Higher layer should return -ENXIO */ > } > > I really appreciate your tackling this implementation. > > -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080821031028.GB30205-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <20080821031028.GB30205-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 3:15 ` H. Peter Anvin [not found] ` <48ACDDC7.3000704-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 3:15 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg, xemul-GEFAQzZX7r8dnm+yROfE0A, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, ebiederm-aS9lmoZGLiVWk0Htik3J/w sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: >> >> I don't like the name "newmnt" for the option; it is not just another >> mount, but a whole new instance of the pty space. > > I agree. Its mostly a place-holder for now. How about newns or newptsns ? > I suggest "newinstance", but "newns" works, too. >> I observe you didn't incorporate my feedback with regards to get_node(). > > Yes, I have not addressed it yet. Will look into it in the next pass, > but will add to the todo list now. OK. -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48ACDDC7.3000704-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48ACDDC7.3000704-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 16:34 ` Cedric Le Goater [not found] ` <48AD991F.9010906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-08-21 16:34 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A H. Peter Anvin wrote: > sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: >>> I don't like the name "newmnt" for the option; it is not just another >>> mount, but a whole new instance of the pty space. >> I agree. Its mostly a place-holder for now. How about newns or newptsns ? >> > > I suggest "newinstance", but "newns" works, too. Could we also use this mount option to 'unshare' a new posix message queue namespace ? C. ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48AD991F.9010906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48AD991F.9010906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 16:40 ` H. Peter Anvin [not found] ` <48AD9A97.6000807-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 16:40 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Cedric Le Goater wrote: >>> >> I suggest "newinstance", but "newns" works, too. > > Could we also use this mount option to 'unshare' a new posix message queue > namespace ? > Sorry, I fail to see the connection with devpts here? Are you suggesting using the same option for another filesystem (if so, which)? -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48AD9A97.6000807-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48AD9A97.6000807-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2008-08-21 16:54 ` Cedric Le Goater [not found] ` <48AD9DCD.3060306-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-08-21 16:54 UTC (permalink / raw) To: H. Peter Anvin Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A H. Peter Anvin wrote: > Cedric Le Goater wrote: >>>> >>> I suggest "newinstance", but "newns" works, too. >> >> Could we also use this mount option to 'unshare' a new posix message >> queue namespace ? > > Sorry, I fail to see the connection with devpts here? Are you > suggesting using the same option for another filesystem (if so, which)? yes. the posix message queues are also using a single superblock filesystem. If we want isolate them (for container needs for example), we also need to create a new sb. The patchset I have uses a clone flag but using a mount 'newns' really sounds like a better idea. C. ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48AD9DCD.3060306-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48AD9DCD.3060306-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-08-21 16:56 ` H. Peter Anvin 2008-08-21 17:28 ` Serge E. Hallyn 2008-08-21 17:45 ` Eric W. Biederman 2 siblings, 0 replies; 51+ messages in thread From: H. Peter Anvin @ 2008-08-21 16:56 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Cedric Le Goater wrote: > H. Peter Anvin wrote: >> Cedric Le Goater wrote: >>>> I suggest "newinstance", but "newns" works, too. >>> Could we also use this mount option to 'unshare' a new posix message >>> queue namespace ? >> Sorry, I fail to see the connection with devpts here? Are you >> suggesting using the same option for another filesystem (if so, which)? > > yes. the posix message queues are also using a single superblock filesystem. > > If we want isolate them (for container needs for example), we also need to > create a new sb. The patchset I have uses a clone flag but using a mount > 'newns' really sounds like a better idea. > OK, so this is a very good motivation for using a nonspecific term like "instance" or "namespace". -hpa ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48AD9DCD.3060306-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 2008-08-21 16:56 ` H. Peter Anvin @ 2008-08-21 17:28 ` Serge E. Hallyn 2008-08-21 17:45 ` Eric W. Biederman 2 siblings, 0 replies; 51+ messages in thread From: Serge E. Hallyn @ 2008-08-21 17:28 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, Nadia Derbey, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Cedric Le Goater (clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org): > H. Peter Anvin wrote: > > Cedric Le Goater wrote: > >>>> > >>> I suggest "newinstance", but "newns" works, too. > >> > >> Could we also use this mount option to 'unshare' a new posix message > >> queue namespace ? > > > > Sorry, I fail to see the connection with devpts here? Are you > > suggesting using the same option for another filesystem (if so, which)? > > yes. the posix message queues are also using a single superblock filesystem. > > If we want isolate them (for container needs for example), we also need to > create a new sb. The patchset I have uses a clone flag but using a mount > 'newns' really sounds like a better idea. > > C. Yup, that sounds like a good reason to just use 'newns'. Now the difference with mq is that you have syscalls like mq_open which need to 'just know' which mqns to use. However, Nadia had just started looking at this and I think she was considering tagging the mounts namespace, which just may make sense. -serge ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48AD9DCD.3060306-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 2008-08-21 16:56 ` H. Peter Anvin 2008-08-21 17:28 ` Serge E. Hallyn @ 2008-08-21 17:45 ` Eric W. Biederman [not found] ` <m1fxoys1ng.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 2 siblings, 1 reply; 51+ messages in thread From: Eric W. Biederman @ 2008-08-21 17:45 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: > H. Peter Anvin wrote: >> Cedric Le Goater wrote: >>>>> >>>> I suggest "newinstance", but "newns" works, too. >>> >>> Could we also use this mount option to 'unshare' a new posix message >>> queue namespace ? >> >> Sorry, I fail to see the connection with devpts here? Are you >> suggesting using the same option for another filesystem (if so, which)? > > yes. the posix message queues are also using a single superblock filesystem. > > If we want isolate them (for container needs for example), we also need to > create a new sb. The patchset I have uses a clone flag but using a mount > 'newns' really sounds like a better idea. Let's call it newinstance if we are going to use the same option for devpts. We can update "current->nsproxy->mqueuens" when the newinstance flag is passed and otherwise we can mount whatever is the current mqueue filesystem for the process. That should be simple and just work. Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <m1fxoys1ng.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <m1fxoys1ng.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> @ 2008-08-21 21:02 ` Cedric Le Goater [not found] ` <48ADD7D3.7080400-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-08-21 21:02 UTC (permalink / raw) To: Eric W. Biederman Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Eric W. Biederman wrote: > Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: > >> H. Peter Anvin wrote: >>> Cedric Le Goater wrote: >>>>> I suggest "newinstance", but "newns" works, too. >>>> Could we also use this mount option to 'unshare' a new posix message >>>> queue namespace ? >>> Sorry, I fail to see the connection with devpts here? Are you >>> suggesting using the same option for another filesystem (if so, which)? >> yes. the posix message queues are also using a single superblock filesystem. >> >> If we want isolate them (for container needs for example), we also need to >> create a new sb. The patchset I have uses a clone flag but using a mount >> 'newns' really sounds like a better idea. > > Let's call it newinstance if we are going to use the same option for devpts. ok. > We can update "current->nsproxy->mqueuens" when the newinstance flag is passed > and otherwise we can mount whatever is the current mqueue filesystem for > the process. yes. I'll rebase my previous patchset on this idea. thanks, C. > That should be simple and just work. > > Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48ADD7D3.7080400-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48ADD7D3.7080400-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-08-29 9:02 ` Cedric Le Goater [not found] ` <48B7BB3C.5080404-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-08-29 9:02 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Cedric Le Goater wrote: > Eric W. Biederman wrote: >> Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: >> >>> H. Peter Anvin wrote: >>>> Cedric Le Goater wrote: >>>>>> I suggest "newinstance", but "newns" works, too. >>>>> Could we also use this mount option to 'unshare' a new posix message >>>>> queue namespace ? >>>> Sorry, I fail to see the connection with devpts here? Are you >>>> suggesting using the same option for another filesystem (if so, which)? >>> yes. the posix message queues are also using a single superblock filesystem. >>> >>> If we want isolate them (for container needs for example), we also need to >>> create a new sb. The patchset I have uses a clone flag but using a mount >>> 'newns' really sounds like a better idea. >> Let's call it newinstance if we are going to use the same option for devpts. > > ok. > >> We can update "current->nsproxy->mqueuens" when the newinstance flag is passed >> and otherwise we can mount whatever is the current mqueue filesystem for >> the process. > > yes. I'll rebase my previous patchset on this idea. Hello Eric, I've spent some time on the code and I'm facing some issues with the nsproxy API if we are to keep the mqueue namespace in nsproxy: int copy_namespaces(unsigned long flags, struct task_struct *tsk); void exit_task_namespaces(struct task_struct *tsk); void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new); void free_nsproxy(struct nsproxy *ns); int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, struct fs_struct *); nsproxy designed to work closely with the clone flags and it is not well suited to be called elsewhere than clone/unshare. So I could either : (1) make a special case for the mqueue namespace and duplicate some code to unshare it from ->get_sb() when the option 'newinstance' is used. (2) to avoid duplicating code, use a clone_flags to unshare the mqueue namespace from ->get_sb() when the option 'newinstance' is used. that sounds silly because we might as well use sys_unshare() in that case. (3) move mq_ns out of nsproxy. where shall I put it then ? (3.1) task_struct ? (3.2) mnt namespace maybe ? BTW, have you taken a look at what dave resent in July ? https://lists.linux-foundation.org/pipermail/containers/2008-July/011776.html This is the mqueue namespace patchset using CLONE_NEWIPC to unshare. I wonder if there are arguments against that approach. I might have forgotten some of the issues with unshare as I haven't looked at the code for some time. Thanks ! C. ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48B7BB3C.5080404-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48B7BB3C.5080404-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-09-02 3:04 ` Serge E. Hallyn [not found] ` <20080902030426.GB12277-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-09-02 9:22 ` Eric W. Biederman 1 sibling, 1 reply; 51+ messages in thread From: Serge E. Hallyn @ 2008-09-02 3:04 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Cedric Le Goater (clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org): > Cedric Le Goater wrote: > > Eric W. Biederman wrote: > >> Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: > >> > >>> H. Peter Anvin wrote: > >>>> Cedric Le Goater wrote: > >>>>>> I suggest "newinstance", but "newns" works, too. > >>>>> Could we also use this mount option to 'unshare' a new posix message > >>>>> queue namespace ? > >>>> Sorry, I fail to see the connection with devpts here? Are you > >>>> suggesting using the same option for another filesystem (if so, which)? > >>> yes. the posix message queues are also using a single superblock filesystem. > >>> > >>> If we want isolate them (for container needs for example), we also need to > >>> create a new sb. The patchset I have uses a clone flag but using a mount > >>> 'newns' really sounds like a better idea. > >> Let's call it newinstance if we are going to use the same option for devpts. > > > > ok. > > > >> We can update "current->nsproxy->mqueuens" when the newinstance flag is passed > >> and otherwise we can mount whatever is the current mqueue filesystem for > >> the process. > > > > yes. I'll rebase my previous patchset on this idea. > > Hello Eric, > > I've spent some time on the code and I'm facing some issues with the nsproxy > API if we are to keep the mqueue namespace in nsproxy: > > int copy_namespaces(unsigned long flags, struct task_struct *tsk); > void exit_task_namespaces(struct task_struct *tsk); > void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new); > void free_nsproxy(struct nsproxy *ns); > int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, > struct fs_struct *); > > nsproxy designed to work closely with the clone flags and it is not well > suited to be called elsewhere than clone/unshare. > > So I could either : > > (1) make a special case for the mqueue namespace and duplicate some code > to unshare it from ->get_sb() when the option 'newinstance' is used. > > (2) to avoid duplicating code, use a clone_flags to unshare the mqueue > namespace from ->get_sb() when the option 'newinstance' is used. that > sounds silly because we might as well use sys_unshare() in that case. > > (3) move mq_ns out of nsproxy. where shall I put it then ? > > (3.1) task_struct ? > (3.2) mnt namespace maybe ? I think the last one is the way to go. mnt_namespace points to mq_ns. At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the parent's mq_ns. If a task does mount -o newinstance -t mqueue none /dev/mqueue then its current->nsproxy->mnt_namespace->mqns is switched to point to a new instance of the mq_ns. mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the devpts fs. When a task does mq_open(name, flag), then name is in the mqueuefs found in current->nsproxy->mnt_namespace->mqns. But if a task does clone(CLONE_NEWMNT); mount --move /dev/mqueue /oldmqueue mount -o newinstance -t mqueue none /dev/mqueue then that task can find files for the old mqueuefs under /oldmqueue, while mq_open() uses /dev/mqueue since that's what it finds through its mnt_namespace. -serge ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080902030426.GB12277-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <20080902030426.GB12277-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-09-02 10:00 ` Eric W. Biederman [not found] ` <m1vdxeeuk0.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 2008-09-03 11:43 ` Cedric Le Goater 1 sibling, 1 reply; 51+ messages in thread From: Eric W. Biederman @ 2008-09-02 10:00 UTC (permalink / raw) To: Serge E. Hallyn Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Cedric Le Goater, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes: >> (3.2) mnt namespace maybe ? > > I think the last one is the way to go. > > mnt_namespace points to mq_ns. > > At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > parent's mq_ns. > > If a task does > mount -o newinstance -t mqueue none /dev/mqueue > then its current->nsproxy->mnt_namespace->mqns is switched > to point to a new instance of the mq_ns. > > mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > devpts fs. > > When a task does mq_open(name, flag), then name is in the mqueuefs > found in current->nsproxy->mnt_namespace->mqns. > > But if a task does > > clone(CLONE_NEWMNT); > mount --move /dev/mqueue /oldmqueue > mount -o newinstance -t mqueue none /dev/mqueue > > then that task can find files for the old mqueuefs under > /oldmqueue, while mq_open() uses /dev/mqueue since that's > what it finds through its mnt_namespace. Serge if we can make the lookup a pure mount namespace operation i.e. a well known path. Than I don't have any problems with it. Otherwise it looks like abuse of the mount namespace. In particular. The best approximation I have is to change the kernel to simply lookup "/dev/mqueue" and if not found fallback to the initial kernel instance. I'm staring at the code as I really haven't looked at it enough but it sure looks like we can transform it into a proper filesystem with just a touch of backwards compatibility logic. - put the current mq_namespace in the superblock. - Have open/unlink lookup "/dev/mqueue" to find the filesystem if nothing is found fallback to the internal mount otherwise error. - Possibly put the tunables in a subdirectory? and bind mount that subdirectory on top of /proc/sys/fs/mqueue/ I'm too thrilled about the tunables but still. Are there any security holes or other oddness we would encounter if we did that? If we can turn the posix mqueue stuff into an honest to goodness filesystem then we completely avoid nsproxy, and have something that is much nicer to deal with long term. Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <m1vdxeeuk0.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <m1vdxeeuk0.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> @ 2008-09-02 15:52 ` Serge E. Hallyn [not found] ` <20080902155211.GF8524-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-09-03 11:47 ` Cedric Le Goater 1 sibling, 1 reply; 51+ messages in thread From: Serge E. Hallyn @ 2008-09-02 15:52 UTC (permalink / raw) To: Eric W. Biederman Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Cedric Le Goater, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes: > > >> (3.2) mnt namespace maybe ? > > > > I think the last one is the way to go. > > > > mnt_namespace points to mq_ns. > > > > At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > > parent's mq_ns. > > > > If a task does > > mount -o newinstance -t mqueue none /dev/mqueue > > then its current->nsproxy->mnt_namespace->mqns is switched > > to point to a new instance of the mq_ns. > > > > mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > > devpts fs. > > > > When a task does mq_open(name, flag), then name is in the mqueuefs > > found in current->nsproxy->mnt_namespace->mqns. > > > > But if a task does > > > > clone(CLONE_NEWMNT); > > mount --move /dev/mqueue /oldmqueue > > mount -o newinstance -t mqueue none /dev/mqueue > > > > then that task can find files for the old mqueuefs under > > /oldmqueue, while mq_open() uses /dev/mqueue since that's > > what it finds through its mnt_namespace. > > Serge if we can make the lookup a pure mount namespace operation > i.e. a well known path. Than I don't have any problems with it. > Otherwise it looks like abuse of the mount namespace. Why? Actually it may work to just put mq_ns straight in the nsproxy. So let's see: mq_open(name, flag): opens name under the dentry pointed to by current->nsproxy->mq_ns->mq_dentry mount -t mqueue none /dev/mqueue: either returns -EBUSY or just mounts current->nsproxy->mq_ns->mq_sb under /dev/mqueue mount -o newinstance -t mqueue none /dev/mqueue: mounts a new mq_ns instance under /dev/mqueue While doing mount --make-rshared /vs1 mount --bind /dev/mqueue /vs1/dev/mqueue create_a_new_container_chrooted_at(/vs1) mount -o newinstance -t mqueue none /dev/mqueue would allow the host to see the child's /dev/mqueue under /vs1/dev/mqueue while having its own mqueuefs continue to be mounted under /dev/mqueue. > In particular. The best approximation I have is to change the > kernel to simply lookup "/dev/mqueue" and if not found fallback > to the initial kernel instance. Having the kernel walk a hard-coded pathname to find it? That I really don't like. > I'm staring at the code as I really haven't looked at it enough > but it sure looks like we can transform it into a proper filesystem > with just a touch of backwards compatibility logic. > - put the current mq_namespace in the superblock. > - Have open/unlink lookup "/dev/mqueue" to find the filesystem > if nothing is found fallback to the internal mount otherwise error. > - Possibly put the tunables in a subdirectory? and > bind mount that subdirectory on top of /proc/sys/fs/mqueue/ > > I'm too thrilled about the tunables but still. If mq_ns is stored under nsproxy, then so long as the task has remounted /proc for its pidns anyway, we should be able to show the right tunables under /proc as well, right? > Are there any security holes or other oddness we would encounter > if we did that? > > If we can turn the posix mqueue stuff into an honest to goodness > filesystem then we completely avoid nsproxy, I am assuming that mq_open() is posix-defined? So the only way we could do that is, as you suggest, to have the kernel look up the hard-coded /dev/mqueue path which IMO is a non-starter, and not worth it to avoid nsproxy. > and have something > that is much nicer to deal with long term. > > Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080902155211.GF8524-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <20080902155211.GF8524-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-09-03 12:01 ` Cedric Le Goater [not found] ` <48BE7C98.1040004-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-09-03 12:01 UTC (permalink / raw) To: Serge E. Hallyn Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Serge E. Hallyn wrote: > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): >> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes: >> >>>> (3.2) mnt namespace maybe ? >>> I think the last one is the way to go. >>> >>> mnt_namespace points to mq_ns. >>> >>> At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the >>> parent's mq_ns. >>> >>> If a task does >>> mount -o newinstance -t mqueue none /dev/mqueue >>> then its current->nsproxy->mnt_namespace->mqns is switched >>> to point to a new instance of the mq_ns. >>> >>> mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the >>> devpts fs. >>> >>> When a task does mq_open(name, flag), then name is in the mqueuefs >>> found in current->nsproxy->mnt_namespace->mqns. >>> >>> But if a task does >>> >>> clone(CLONE_NEWMNT); >>> mount --move /dev/mqueue /oldmqueue >>> mount -o newinstance -t mqueue none /dev/mqueue >>> >>> then that task can find files for the old mqueuefs under >>> /oldmqueue, while mq_open() uses /dev/mqueue since that's >>> what it finds through its mnt_namespace. >> Serge if we can make the lookup a pure mount namespace operation >> i.e. a well known path. Than I don't have any problems with it. >> Otherwise it looks like abuse of the mount namespace. > > Why? > > Actually it may work to just put mq_ns straight in the nsproxy. ok. that's the path I was taking. > So let's see: > > mq_open(name, flag): opens name under the dentry pointed > to by current->nsproxy->mq_ns->mq_dentry > mount -t mqueue none /dev/mqueue: either returns -EBUSY > or just mounts current->nsproxy->mq_ns->mq_sb > under /dev/mqueue > mount -o newinstance -t mqueue none /dev/mqueue: mounts > a new mq_ns instance under /dev/mqueue > > While doing > mount --make-rshared /vs1 > mount --bind /dev/mqueue /vs1/dev/mqueue > create_a_new_container_chrooted_at(/vs1) > mount -o newinstance -t mqueue none /dev/mqueue > would allow the host to see the child's /dev/mqueue under > /vs1/dev/mqueue while having its own mqueuefs continue to be > mounted under /dev/mqueue. ok. complete isolation would require 2 steps. I guess this is acceptable because mq uses a fs allowing the host to see the child's /dev/mqueue is also 'a nice to have' feature. unfortunately, we can't do that for all namespaces, for sysvipc for example. So I'm wondering if we should allow it at all ? Thanks, C. ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48BE7C98.1040004-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48BE7C98.1040004-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-09-03 13:12 ` Eric W. Biederman 2008-09-03 13:41 ` Serge E. Hallyn 1 sibling, 0 replies; 51+ messages in thread From: Eric W. Biederman @ 2008-09-03 13:12 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: > ok. complete isolation would require 2 steps. I guess this is > acceptable because mq uses a fs > > allowing the host to see the child's /dev/mqueue is also 'a nice > to have' feature. unfortunately, we can't do that for all namespaces, > for sysvipc for example. So I'm wondering if we should allow it > at all ? Definitely. One of the lessons from the people doing monitoring is that it really is best done through a filesystem interface. You don't have to have it mounted and there are times you don't want to be able to mount a view into another namespace but having the option is nice. I'm torn because the more I look at the way posix message queues are implemented the more it looks like new versions of sys_open and sys_unlink should never have been written and it should have been a user space convention to always mount mqueuefs on /dev/mqueue. Just doing newinstance and having a pointer in nsproxy will get the job done, but it feels like we may have the opportunity to correct a blunder in the initial implementation. Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48BE7C98.1040004-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 2008-09-03 13:12 ` Eric W. Biederman @ 2008-09-03 13:41 ` Serge E. Hallyn 1 sibling, 0 replies; 51+ messages in thread From: Serge E. Hallyn @ 2008-09-03 13:41 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Cedric Le Goater (clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org): > Serge E. Hallyn wrote: > > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > >> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes: > >> > >>>> (3.2) mnt namespace maybe ? > >>> I think the last one is the way to go. > >>> > >>> mnt_namespace points to mq_ns. > >>> > >>> At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > >>> parent's mq_ns. > >>> > >>> If a task does > >>> mount -o newinstance -t mqueue none /dev/mqueue > >>> then its current->nsproxy->mnt_namespace->mqns is switched > >>> to point to a new instance of the mq_ns. > >>> > >>> mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > >>> devpts fs. > >>> > >>> When a task does mq_open(name, flag), then name is in the mqueuefs > >>> found in current->nsproxy->mnt_namespace->mqns. > >>> > >>> But if a task does > >>> > >>> clone(CLONE_NEWMNT); > >>> mount --move /dev/mqueue /oldmqueue > >>> mount -o newinstance -t mqueue none /dev/mqueue > >>> > >>> then that task can find files for the old mqueuefs under > >>> /oldmqueue, while mq_open() uses /dev/mqueue since that's > >>> what it finds through its mnt_namespace. > >> Serge if we can make the lookup a pure mount namespace operation > >> i.e. a well known path. Than I don't have any problems with it. > >> Otherwise it looks like abuse of the mount namespace. > > > > Why? > > > > Actually it may work to just put mq_ns straight in the nsproxy. > > ok. that's the path I was taking. > > > So let's see: > > > > mq_open(name, flag): opens name under the dentry pointed > > to by current->nsproxy->mq_ns->mq_dentry > > mount -t mqueue none /dev/mqueue: either returns -EBUSY > > or just mounts current->nsproxy->mq_ns->mq_sb > > under /dev/mqueue > > mount -o newinstance -t mqueue none /dev/mqueue: mounts > > a new mq_ns instance under /dev/mqueue > > > > While doing > > mount --make-rshared /vs1 > > mount --bind /dev/mqueue /vs1/dev/mqueue > > create_a_new_container_chrooted_at(/vs1) > > mount -o newinstance -t mqueue none /dev/mqueue > > would allow the host to see the child's /dev/mqueue under > > /vs1/dev/mqueue while having its own mqueuefs continue to be > > mounted under /dev/mqueue. > > ok. complete isolation would require 2 steps. I guess this is > acceptable because mq uses a fs What do you mean, it would require 2 steps? You mean umount followed by a mount? Not really, since /dev/mqueue never needed to be bind-mounted under /vs1/dev/mqueue to begin with, so all the container has to do is mount -o newinstance -t mqueue none /dev/mqueue (while chrooted under /vs1) IMO two steps means unshare(CLONE_NEWIPC) and mount /dev/mqueue, which is what the last patchset required. > allowing the host to see the child's /dev/mqueue is also 'a nice > to have' feature. unfortunately, we can't do that for all namespaces, > for sysvipc for example. So I'm wondering if we should allow it > at all ? > > Thanks, > > C. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <m1vdxeeuk0.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 2008-09-02 15:52 ` Serge E. Hallyn @ 2008-09-03 11:47 ` Cedric Le Goater [not found] ` <48BE7959.1080109-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-09-03 11:47 UTC (permalink / raw) To: Eric W. Biederman Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Eric W. Biederman wrote: > "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes: > >>> (3.2) mnt namespace maybe ? >> I think the last one is the way to go. >> >> mnt_namespace points to mq_ns. >> >> At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the >> parent's mq_ns. >> >> If a task does >> mount -o newinstance -t mqueue none /dev/mqueue >> then its current->nsproxy->mnt_namespace->mqns is switched >> to point to a new instance of the mq_ns. >> >> mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the >> devpts fs. >> >> When a task does mq_open(name, flag), then name is in the mqueuefs >> found in current->nsproxy->mnt_namespace->mqns. >> >> But if a task does >> >> clone(CLONE_NEWMNT); >> mount --move /dev/mqueue /oldmqueue >> mount -o newinstance -t mqueue none /dev/mqueue >> >> then that task can find files for the old mqueuefs under >> /oldmqueue, while mq_open() uses /dev/mqueue since that's >> what it finds through its mnt_namespace. > > Serge if we can make the lookup a pure mount namespace operation > i.e. a well known path. Than I don't have any problems with it. > Otherwise it looks like abuse of the mount namespace. > > In particular. The best approximation I have is to change the > kernel to simply lookup "/dev/mqueue" and if not found fallback > to the initial kernel instance. > > I'm staring at the code as I really haven't looked at it enough > but it sure looks like we can transform it into a proper filesystem > with just a touch of backwards compatibility logic. > - put the current mq_namespace in the superblock. ok that is done. using the s_fs_info. > - Have open/unlink lookup "/dev/mqueue" to find the filesystem > if nothing is found fallback to the internal mount otherwise error. what do you mean ? loop on the mnt_namespace of current to find a 'struct vfsmount' pointing to /dev/mqueue ? C. > - Possibly put the tunables in a subdirectory? and > bind mount that subdirectory on top of /proc/sys/fs/mqueue/ > > I'm too thrilled about the tunables but still. > > Are there any security holes or other oddness we would encounter > if we did that? > > If we can turn the posix mqueue stuff into an honest to goodness > filesystem then we completely avoid nsproxy, and have something > that is much nicer to deal with long term. > > Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48BE7959.1080109-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48BE7959.1080109-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-09-03 13:24 ` Serge E. Hallyn 0 siblings, 0 replies; 51+ messages in thread From: Serge E. Hallyn @ 2008-09-03 13:24 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Cedric Le Goater (clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org): > Eric W. Biederman wrote: > > "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes: > > > >>> (3.2) mnt namespace maybe ? > >> I think the last one is the way to go. > >> > >> mnt_namespace points to mq_ns. > >> > >> At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > >> parent's mq_ns. > >> > >> If a task does > >> mount -o newinstance -t mqueue none /dev/mqueue > >> then its current->nsproxy->mnt_namespace->mqns is switched > >> to point to a new instance of the mq_ns. > >> > >> mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > >> devpts fs. > >> > >> When a task does mq_open(name, flag), then name is in the mqueuefs > >> found in current->nsproxy->mnt_namespace->mqns. > >> > >> But if a task does > >> > >> clone(CLONE_NEWMNT); > >> mount --move /dev/mqueue /oldmqueue > >> mount -o newinstance -t mqueue none /dev/mqueue > >> > >> then that task can find files for the old mqueuefs under > >> /oldmqueue, while mq_open() uses /dev/mqueue since that's > >> what it finds through its mnt_namespace. > > > > Serge if we can make the lookup a pure mount namespace operation > > i.e. a well known path. Than I don't have any problems with it. > > Otherwise it looks like abuse of the mount namespace. > > > > In particular. The best approximation I have is to change the > > kernel to simply lookup "/dev/mqueue" and if not found fallback > > to the initial kernel instance. > > > > I'm staring at the code as I really haven't looked at it enough > > but it sure looks like we can transform it into a proper filesystem > > with just a touch of backwards compatibility logic. > > - put the current mq_namespace in the superblock. > > ok that is done. using the s_fs_info. > > > - Have open/unlink lookup "/dev/mqueue" to find the filesystem > > if nothing is found fallback to the internal mount otherwise error. > > what do you mean ? loop on the mnt_namespace of current to find a > 'struct vfsmount' pointing to /dev/mqueue ? I think he means start a lookup starting from current->fs->root to find /dev/mqueue. -serge ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <20080902030426.GB12277-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-09-02 10:00 ` Eric W. Biederman @ 2008-09-03 11:43 ` Cedric Le Goater [not found] ` <48BE7845.6070500-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 51+ messages in thread From: Cedric Le Goater @ 2008-09-03 11:43 UTC (permalink / raw) To: Serge E. Hallyn Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A >> >> (3) move mq_ns out of nsproxy. where shall I put it then ? >> >> (3.1) task_struct ? >> (3.2) mnt namespace maybe ? > > I think the last one is the way to go. > > mnt_namespace points to mq_ns. > > At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > parent's mq_ns. hmm, hmm, hmm, I still thinking about this. > If a task does > mount -o newinstance -t mqueue none /dev/mqueue > then its current->nsproxy->mnt_namespace->mqns is switched > to point to a new instance of the mq_ns. > > mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > devpts fs. [trying to understand what you have in mind ] why not keep the 'struct vfsmount' in the mq_ns, as the code is doing today ? the vfsmount holds both the root dentry and the superblock. > When a task does mq_open(name, flag), then name is in the mqueuefs > found in current->nsproxy->mnt_namespace->mqns. > > But if a task does > > clone(CLONE_NEWMNT); > mount --move /dev/mqueue /oldmqueue > mount -o newinstance -t mqueue none /dev/mqueue > > then that task can find files for the old mqueuefs under > /oldmqueue, while mq_open() uses /dev/mqueue since that's > what it finds through its mnt_namespace. That I don't like. Even though posix mqueue objects can outlive a process, I don't think a process should be able to peek and poke in a message queue namespace other than his. this is the basic principle of the namespaces : isolation. Am I wrong ? couldn't we just return EACCES ? (not posix) C. ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <48BE7845.6070500-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48BE7845.6070500-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-09-03 13:23 ` Serge E. Hallyn [not found] ` <20080903132307.GA9527-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 51+ messages in thread From: Serge E. Hallyn @ 2008-09-03 13:23 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Cedric Le Goater (clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org): > >> > >> (3) move mq_ns out of nsproxy. where shall I put it then ? > >> > >> (3.1) task_struct ? > >> (3.2) mnt namespace maybe ? > > > > I think the last one is the way to go. > > > > mnt_namespace points to mq_ns. > > > > At clone(CLONE_NEWMNT), the new mnt namespace receives a copy of the > > parent's mq_ns. > > hmm, hmm, hmm, I still thinking about this. > > > If a task does > > mount -o newinstance -t mqueue none /dev/mqueue > > then its current->nsproxy->mnt_namespace->mqns is switched > > to point to a new instance of the mq_ns. > > > > mnt_ns->mq_ns has pointers to the sb (and hence root dentry) of the > > devpts fs. > > [trying to understand what you have in mind ] > > why not keep the 'struct vfsmount' in the mq_ns, as the code is doing > today ? the vfsmount holds both the root dentry and the superblock. Yeah that's fine :) > > When a task does mq_open(name, flag), then name is in the mqueuefs > > found in current->nsproxy->mnt_namespace->mqns. > > > > But if a task does > > > > clone(CLONE_NEWMNT); > > mount --move /dev/mqueue /oldmqueue > > mount -o newinstance -t mqueue none /dev/mqueue > > > > then that task can find files for the old mqueuefs under > > /oldmqueue, while mq_open() uses /dev/mqueue since that's > > what it finds through its mnt_namespace. > > That I don't like. > > Even though posix mqueue objects can outlive a process, I don't think > a process should be able to peek and poke in a message queue namespace > other than his. this is the basic principle of the namespaces : > isolation. Am I wrong ? Yes you are wrong in this case. In particular consider mounts propagation, which allows you to to examine both a child namespace's mounts namespace, and, through the child's /sys (presumably mounted under /vs1/sys), his network namespace and eventually device namespace. > couldn't we just return EACCES ? (not posix) We could. And if we think there is really no value in viewing a child namespace's mqueuefs then we may as well. I just want to make it clear that the proposed semantics support it as an option. -serge ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <20080903132307.GA9527-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <20080903132307.GA9527-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-09-03 13:52 ` Cedric Le Goater 0 siblings, 0 replies; 51+ messages in thread From: Cedric Le Goater @ 2008-09-03 13:52 UTC (permalink / raw) To: Serge E. Hallyn Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A >>> When a task does mq_open(name, flag), then name is in the mqueuefs >>> found in current->nsproxy->mnt_namespace->mqns. >>> >>> But if a task does >>> >>> clone(CLONE_NEWMNT); >>> mount --move /dev/mqueue /oldmqueue >>> mount -o newinstance -t mqueue none /dev/mqueue >>> >>> then that task can find files for the old mqueuefs under >>> /oldmqueue, while mq_open() uses /dev/mqueue since that's >>> what it finds through its mnt_namespace. >> That I don't like. >> >> Even though posix mqueue objects can outlive a process, I don't think >> a process should be able to peek and poke in a message queue namespace >> other than his. this is the basic principle of the namespaces : >> isolation. Am I wrong ? > > Yes you are wrong in this case. In particular consider mounts > propagation, which allows you to to examine both a child namespace's > mounts namespace, and, through the child's /sys (presumably > mounted under /vs1/sys), his network namespace and eventually > device namespace. but there are some accounting done which could be fooled, like the max number of mqueues or the number of bytes used by the mqueues which is on a user_struct basis. >> couldn't we just return EACCES ? (not posix) > > We could. And if we think there is really no value in viewing > a child namespace's mqueuefs then we may as well. I just want > to make it clear that the proposed semantics support it as an > option. This make sense C. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <48B7BB3C.5080404-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 2008-09-02 3:04 ` Serge E. Hallyn @ 2008-09-02 9:22 ` Eric W. Biederman [not found] ` <m1d4jmhpgl.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 1 sibling, 1 reply; 51+ messages in thread From: Eric W. Biederman @ 2008-09-02 9:22 UTC (permalink / raw) To: Cedric Le Goater Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: > Hello Eric, > > I've spent some time on the code and I'm facing some issues with the nsproxy > API if we are to keep the mqueue namespace in nsproxy: > > int copy_namespaces(unsigned long flags, struct task_struct *tsk); > void exit_task_namespaces(struct task_struct *tsk); > void switch_task_namespaces(struct task_struct *tsk, struct nsproxy > *new); > void free_nsproxy(struct nsproxy *ns); > int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, > struct fs_struct *); > > nsproxy designed to work closely with the clone flags and it is not well > suited to be called elsewhere than clone/unshare. Right. > So I could either : > > (1) make a special case for the mqueue namespace and duplicate some code > to unshare it from ->get_sb() when the option 'newinstance' is used. This is probably the best option. You should be able to just wrap create_new_namespaces(flags=0,...) to create a new one. And then actually perform the unshare. Which means in practice you will bump the ref count on mq_ns and then drop it and insert a new mq_ns pointer but overall that shouldn't be too bad. > (2) to avoid duplicating code, use a clone_flags to unshare the mqueue > namespace from ->get_sb() when the option 'newinstance' is used. that > sounds silly because we might as well use sys_unshare() in that case. We should be able to refactor the code so we don't have lots of duplication. > (3) move mq_ns out of nsproxy. where shall I put it then ? > > (3.1) task_struct ? That is almost interesting. > (3.2) mnt namespace maybe ? > > BTW, have you taken a look at what dave resent in July ? I took a quick look. > https://lists.linux-foundation.org/pipermail/containers/2008-July/011776.html > > This is the mqueue namespace patchset using CLONE_NEWIPC to unshare. > > I wonder if there are arguments against that approach. I might have > forgotten some of the issues with unshare as I haven't looked at the > code for some time. I don't know that there are any big issues. But there is a certain purity in saying the mq namespaces is it's own weird global namespace that isn't connected to anything else. Which unfortunately happens to be the truth. Eric ^ permalink raw reply [flat|nested] 51+ messages in thread
[parent not found: <m1d4jmhpgl.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>]
* Re: [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts [not found] ` <m1d4jmhpgl.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> @ 2008-09-03 12:04 ` Cedric Le Goater 0 siblings, 0 replies; 51+ messages in thread From: Cedric Le Goater @ 2008-09-03 12:04 UTC (permalink / raw) To: Eric W. Biederman Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, Dave Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin, containers-qjLDD68F18O7TbgM5vRIOg, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io, xemul-GEFAQzZX7r8dnm+yROfE0A Eric W. Biederman wrote: > Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes: >> Hello Eric, >> >> I've spent some time on the code and I'm facing some issues with the nsproxy >> API if we are to keep the mqueue namespace in nsproxy: >> >> int copy_namespaces(unsigned long flags, struct task_struct *tsk); >> void exit_task_namespaces(struct task_struct *tsk); >> void switch_task_namespaces(struct task_struct *tsk, struct nsproxy >> *new); >> void free_nsproxy(struct nsproxy *ns); >> int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, >> struct fs_struct *); >> >> nsproxy designed to work closely with the clone flags and it is not well >> suited to be called elsewhere than clone/unshare. > > Right. > >> So I could either : >> >> (1) make a special case for the mqueue namespace and duplicate some code >> to unshare it from ->get_sb() when the option 'newinstance' is used. > > This is probably the best option. > > You should be able to just wrap create_new_namespaces(flags=0,...) > to create a new one. > > And then actually perform the unshare. > > Which means in practice you will bump the ref count on > mq_ns and then drop it and insert a new mq_ns pointer but overall > that shouldn't be too bad. Thanks for the comments, Eric. I should have something ready soon. I'm currently reviewing the C/R patches that have been sent recently. C. ^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2008-09-03 13:52 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-21 2:21 [RFC][PATCH 0/8][v2]: Enable multiple mounts of devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821022126.GA29449-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 2:26 ` [RFC][PATCH 1/8]: /dev/tty tweak in init_dev() sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821022621.GA29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 9:19 ` Alan Cox
2008-08-21 9:26 ` Alan Cox
2008-08-21 2:26 ` [RFC][PATCH 2/8]: Add inode parameter devpts interfaces sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:27 ` [RFC][PATCH 3/8]: Remove devpts_root global sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:27 ` [RFC][PATCH 4/8]: Per-mount allocated_ptys sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:28 ` [RFC][PATCH 5/8]: Per-mount 'config' object sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:28 ` [RFC][PATCH 6/8]: Extract option parsing to new function sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:29 ` [RFC][PATCH 7/8]: Auto-create ptmx node when mounting devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821022908.GG29658-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 9:21 ` Alan Cox
[not found] ` <20080821102139.43c44f67-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2008-08-21 16:09 ` H. Peter Anvin
[not found] ` <48AD932F.8090908-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 16:27 ` Alan Cox
[not found] ` <20080821172700.781b0011-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2008-08-21 16:49 ` H. Peter Anvin
[not found] ` <48AD9C93.6080302-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 17:22 ` Serge E. Hallyn
[not found] ` <20080821172245.GA28411-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 17:07 ` Alan Cox
2008-08-21 17:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821172342.GA8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 17:38 ` Eric W. Biederman
[not found] ` <m18wuqtgj7.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-08-21 17:50 ` H. Peter Anvin
[not found] ` <48ADAAE2.6040700-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 18:23 ` Eric W. Biederman
[not found] ` <m1hc9eqlbo.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-08-21 18:36 ` H. Peter Anvin
2008-08-21 17:40 ` H. Peter Anvin
[not found] ` <48ADA890.4060309-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 18:11 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821181133.GB8059-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 18:17 ` H. Peter Anvin
2008-08-21 21:00 ` Serge E. Hallyn
[not found] ` <20080821210040.GA14532-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 22:16 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:29 ` [RFC][PATCH 8/8]: Enable multiple mounts of devpts sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2008-08-21 2:45 ` [RFC][PATCH 0/8][v2]: " H. Peter Anvin
[not found] ` <48ACD6CB.5030706-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 3:10 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080821031028.GB30205-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-21 3:15 ` H. Peter Anvin
[not found] ` <48ACDDC7.3000704-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 16:34 ` Cedric Le Goater
[not found] ` <48AD991F.9010906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-08-21 16:40 ` H. Peter Anvin
[not found] ` <48AD9A97.6000807-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-08-21 16:54 ` Cedric Le Goater
[not found] ` <48AD9DCD.3060306-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-08-21 16:56 ` H. Peter Anvin
2008-08-21 17:28 ` Serge E. Hallyn
2008-08-21 17:45 ` Eric W. Biederman
[not found] ` <m1fxoys1ng.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-08-21 21:02 ` Cedric Le Goater
[not found] ` <48ADD7D3.7080400-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-08-29 9:02 ` Cedric Le Goater
[not found] ` <48B7BB3C.5080404-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-09-02 3:04 ` Serge E. Hallyn
[not found] ` <20080902030426.GB12277-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-02 10:00 ` Eric W. Biederman
[not found] ` <m1vdxeeuk0.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-09-02 15:52 ` Serge E. Hallyn
[not found] ` <20080902155211.GF8524-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-03 12:01 ` Cedric Le Goater
[not found] ` <48BE7C98.1040004-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-09-03 13:12 ` Eric W. Biederman
2008-09-03 13:41 ` Serge E. Hallyn
2008-09-03 11:47 ` Cedric Le Goater
[not found] ` <48BE7959.1080109-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-09-03 13:24 ` Serge E. Hallyn
2008-09-03 11:43 ` Cedric Le Goater
[not found] ` <48BE7845.6070500-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-09-03 13:23 ` Serge E. Hallyn
[not found] ` <20080903132307.GA9527-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-03 13:52 ` Cedric Le Goater
2008-09-02 9:22 ` Eric W. Biederman
[not found] ` <m1d4jmhpgl.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-09-03 12:04 ` Cedric Le Goater
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.