All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Multiple devpts instances
@ 2008-10-15  5:30 sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:30 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	xemul-GEFAQzZX7r8dnm+yROfE0A


Enable multiple instances of devpts filesystem so each container can allocate
ptys independently.

User interface:

	Since supporting multiple mounts of devpts can break user-space, this
	feature is enabled only if:

		- CONFIG_DEVPTS_MULTIPLE_INSTANCES=y (new config token), and
		- new mount option, -o newinstance is specified

	If CONFIG_DEVPTS_MULTIPLE_INSTANCES=n, there should be no change in
	behavior.

	See [PATCH 9/9] - Documentation/filesystems/devpts.txt for detailed
	usage/compatibility information.

		[PATCH 1/9] Remove devpts_root global
		[PATCH 2/9] Per-mount allocated_ptys
		[PATCH 3/9] Per-mount 'config' object
		[PATCH 4/9] Extract option parsing to new function
		[PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token
		[PATCH 6/9] Define mknod_ptmx()
		[PATCH 7/9] Define get_init_pts_sb()
		[PATCH 8/9] Enable multiple instances of devpts
		[PATCH 9/9] Document usage of multiple-instances of devpts
	
Implementation notes:

	1. To enable multiple mounts of /dev/pts, (most) devpts interfaces
	   need to know which instance of devpts is being accessed. This
	   patchset uses the 'struct inode' or 'struct tty_struct' of the
	   device being accessed to identify the appropriate devpts instance.
	   Hence the need for the /dev/pts/ptmx bind-mount.

	2. Mount options must be parsed twice during mount (once to determine
	   the mode of mount (single/multi-instance) and once to actually
	   save the options. There does not seem to be an easy way to parse
	   once and reuse (See 'safe_process_mount_opts()' in [PATCH 9/10])


Changelog [v5]:
	- Merge all option-parsing (previously in patches 6,7) into patch 6.
	- Bugfixes (see changelog in patches 6 and 8)
	- Replace get_sb_ref() with devpts-specific get_init_pts_sb() (patch 7)
	- Minor update to devpts.txt documentation (patch 9)


Changelog [v4]:

	- Port to 2008-09-04 ttydev tree (and drop patches that were merged in)
	- Add DEVPTS_MULTIPLE_INSTANCES config token (patch 5) and move new
	  behavior under #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES. 
	- Create ptmx node even in initial mount 
	- Change default permissions of pts/ptmx node to 0000 (patch 6)
	- Cache ptmx dentry and use to update permissions of ptmx node on
	  remount so legacy mode can use the node with a simpler change to
	  /etc/fstab (patch 7)
	- Move get_sb_ref() helper code to separate patch (patch 8)
	- Add Documentation/filesystems/devpts.txt and describe usage info
	  in that file.
	  
Changelog [v3]:

	- Port to 2008-08-28 ttydev tree
	- Rename new mount options to 'ptmxmode' and 'newinstance'.
	- [Alan Cox] Use tty driver data to identify devpts (this is used to
	  cleanup get_node() in devpts_pty_kill()).
	- [H. Peter Anvin] get_node() cleanup in devpts (which was enabled by
	  the inode/tty parameter to devpts interfaces)
	- Bugfix in multi-mount mode (see Patch 11/11).
	- Executed pty tests in LTP (in both single-instance and multi-instance
	  mode)
	- Should be bisect-safe :-)

Changelog [v2]:

	- New mount option '-o newmnt' added (patch 8/8)
	- Support both single-mount and multi-mount semantics (patch 8/8)
	- Automatically create ptmx node when devpts is mounted (patch 7/8)
	- Extract option parsing code to new function (patch 6/8)
	- Make 'config' params per-mount variables (patch 5/8)
	- Slightly re-ordered existing patches in set (patches 1/8, 2/8)

TODO:
	- Do we need a '-o ptmxuid' and '-o ptmxgid' options as well ?
	- Update mount(8) man page
	- (Sometime in future) Remove even initial kernel mount of devpts 
	- Any other good test suites to test this (besides LTP, sshd).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/9] Remove devpts_root global
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-10-15  5:33   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:33   ` [PATCH 2/9] Per-mount allocated_ptys sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:33 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From 1479f9e238d607403abf5af4296bd5c84a1dc18f Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 1/9] Remove devpts_root global

Remove the 'devpts_root' global variable and find the root dentry using
the super_block. The super-block can be found from the device inode, using
the new wrapper, pts_sb_from_inode().

Changelog: This patch is based on an earlier patchset from Serge Hallyn
	   and Matt Helsley.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 fs/devpts/inode.c |   29 ++++++++++++++++++++---------
 1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index a70d5d0..ec33833 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -34,7 +34,6 @@ static DEFINE_IDA(allocated_ptys);
 static DEFINE_MUTEX(allocated_ptys_lock);
 
 static struct vfsmount *devpts_mnt;
-static struct dentry *devpts_root;
 
 static struct {
 	int setuid;
@@ -56,6 +55,14 @@ static match_table_t tokens = {
 	{Opt_err, NULL}
 };
 
+static inline struct super_block *pts_sb_from_inode(struct inode *inode)
+{
+	if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC)
+		return inode->i_sb;
+
+	return devpts_mnt->mnt_sb;
+}
+
 static int devpts_remount(struct super_block *sb, int *flags, char *data)
 {
 	char *p;
@@ -142,7 +149,7 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
 	inode->i_fop = &simple_dir_operations;
 	inode->i_nlink = 2;
 
-	devpts_root = s->s_root = d_alloc_root(inode);
+	s->s_root = d_alloc_root(inode);
 	if (s->s_root)
 		return 0;
 	
@@ -211,7 +218,9 @@ int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty)
 	struct tty_driver *driver = tty->driver;
 	dev_t device = MKDEV(driver->major, driver->minor_start+number);
 	struct dentry *dentry;
-	struct inode *inode = new_inode(devpts_mnt->mnt_sb);
+	struct super_block *sb = pts_sb_from_inode(ptmx_inode);
+	struct inode *inode = new_inode(sb);
+	struct dentry *root = sb->s_root;
 	char s[12];
 
 	/* We're supposed to be given the slave end of a pty */
@@ -231,15 +240,15 @@ int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty)
 
 	sprintf(s, "%d", number);
 
-	mutex_lock(&devpts_root->d_inode->i_mutex);
+	mutex_lock(&root->d_inode->i_mutex);
 
-	dentry = d_alloc_name(devpts_root, s);
+	dentry = d_alloc_name(root, s);
 	if (!IS_ERR(dentry)) {
 		d_add(dentry, inode);
-		fsnotify_create(devpts_root->d_inode, dentry);
+		fsnotify_create(root->d_inode, dentry);
 	}
 
-	mutex_unlock(&devpts_root->d_inode->i_mutex);
+	mutex_unlock(&root->d_inode->i_mutex);
 
 	return 0;
 }
@@ -256,11 +265,13 @@ struct tty_struct *devpts_get_tty(struct inode *pts_inode, int number)
 void devpts_pty_kill(struct tty_struct *tty)
 {
 	struct inode *inode = tty->driver_data;
+	struct super_block *sb = pts_sb_from_inode(inode);
+	struct dentry *root = sb->s_root;
 	struct dentry *dentry;
 
 	BUG_ON(inode->i_rdev == MKDEV(TTYAUX_MAJOR, PTMX_MINOR));
 
-	mutex_lock(&devpts_root->d_inode->i_mutex);
+	mutex_lock(&root->d_inode->i_mutex);
 
 	dentry = d_find_alias(inode);
 	if (dentry && !IS_ERR(dentry)) {
@@ -269,7 +280,7 @@ void devpts_pty_kill(struct tty_struct *tty)
 		dput(dentry);
 	}
 
-	mutex_unlock(&devpts_root->d_inode->i_mutex);
+	mutex_unlock(&root->d_inode->i_mutex);
 }
 
 static int __init init_devpts_fs(void)
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/9] Per-mount allocated_ptys
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2008-10-15  5:33   ` [PATCH 1/9] Remove devpts_root global sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:33   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:34   ` [PATCH 3/9] Per-mount 'config' object sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:33 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From d12a714cbd541b808a80f9f556fda4a1f3bf4198 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 2/9] Per-mount allocated_ptys

To enable multiple mounts of devpts, 'allocated_ptys' must be a per-mount
variable rather than a global variable.  Move 'allocated_ptys' into the
super_block's s_fs_info.

Changelog[v2]:
	Define and use DEVPTS_SB() wrapper.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/devpts/inode.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index ec33833..6e63db7 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -30,7 +30,6 @@
 #define PTMX_MINOR	2
 
 extern int pty_limit;			/* Config limit on Unix98 ptys */
-static DEFINE_IDA(allocated_ptys);
 static DEFINE_MUTEX(allocated_ptys_lock);
 
 static struct vfsmount *devpts_mnt;
@@ -55,6 +54,15 @@ static match_table_t tokens = {
 	{Opt_err, NULL}
 };
 
+struct pts_fs_info {
+	struct ida allocated_ptys;
+};
+
+static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
+{
+	return sb->s_fs_info;
+}
+
 static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 {
 	if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC)
@@ -126,6 +134,19 @@ static const struct super_operations devpts_sops = {
 	.show_options	= devpts_show_options,
 };
 
+static void *new_pts_fs_info(void)
+{
+	struct pts_fs_info *fsi;
+
+	fsi = kzalloc(sizeof(struct pts_fs_info), GFP_KERNEL);
+	if (!fsi)
+		return NULL;
+
+	ida_init(&fsi->allocated_ptys);
+
+	return fsi;
+}
+
 static int
 devpts_fill_super(struct super_block *s, void *data, int silent)
 {
@@ -137,9 +158,13 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
 	s->s_op = &devpts_sops;
 	s->s_time_gran = 1;
 
+	s->s_fs_info = new_pts_fs_info();
+	if (!s->s_fs_info)
+		goto fail;
+
 	inode = new_inode(s);
 	if (!inode)
-		goto fail;
+		goto free_fsi;
 	inode->i_ino = 1;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	inode->i_blocks = 0;
@@ -155,6 +180,9 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
 	
 	printk("devpts: get root dentry failed\n");
 	iput(inode);
+
+free_fsi:
+	kfree(s->s_fs_info);
 fail:
 	return -ENOMEM;
 }
@@ -165,11 +193,19 @@ static int devpts_get_sb(struct file_system_type *fs_type,
 	return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt);
 }
 
+static void devpts_kill_sb(struct super_block *sb)
+{
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
+
+	kfree(fsi);
+	kill_anon_super(sb);
+}
+
 static struct file_system_type devpts_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "devpts",
 	.get_sb		= devpts_get_sb,
-	.kill_sb	= kill_anon_super,
+	.kill_sb	= devpts_kill_sb,
 };
 
 /*
@@ -179,16 +215,18 @@ static struct file_system_type devpts_fs_type = {
 
 int devpts_new_index(struct inode *ptmx_inode)
 {
+	struct super_block *sb = pts_sb_from_inode(ptmx_inode);
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
 	int index;
 	int ida_ret;
 
 retry:
-	if (!ida_pre_get(&allocated_ptys, GFP_KERNEL)) {
+	if (!ida_pre_get(&fsi->allocated_ptys, GFP_KERNEL)) {
 		return -ENOMEM;
 	}
 
 	mutex_lock(&allocated_ptys_lock);
-	ida_ret = ida_get_new(&allocated_ptys, &index);
+	ida_ret = ida_get_new(&fsi->allocated_ptys, &index);
 	if (ida_ret < 0) {
 		mutex_unlock(&allocated_ptys_lock);
 		if (ida_ret == -EAGAIN)
@@ -197,7 +235,7 @@ retry:
 	}
 
 	if (index >= pty_limit) {
-		ida_remove(&allocated_ptys, index);
+		ida_remove(&fsi->allocated_ptys, index);
 		mutex_unlock(&allocated_ptys_lock);
 		return -EIO;
 	}
@@ -207,8 +245,11 @@ retry:
 
 void devpts_kill_index(struct inode *ptmx_inode, int idx)
 {
+	struct super_block *sb = pts_sb_from_inode(ptmx_inode);
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
+
 	mutex_lock(&allocated_ptys_lock);
-	ida_remove(&allocated_ptys, idx);
+	ida_remove(&fsi->allocated_ptys, idx);
 	mutex_unlock(&allocated_ptys_lock);
 }
 
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/9] Per-mount 'config' object
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2008-10-15  5:33   ` [PATCH 1/9] Remove devpts_root global sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:33   ` [PATCH 2/9] Per-mount allocated_ptys sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:34   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:35   ` [PATCH 4/9] Extract option parsing to new function sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:34 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From fa07e30bf77063b127129d317e91d6dc454ea739 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 3/9] Per-mount 'config' object

With support for multiple mounts of devpts, the 'config' structure really
represents per-mount options rather than config parameters. Rename 'config'
structure to 'pts_mount_opts' and store it in the super-block.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 fs/devpts/inode.c |   49 +++++++++++++++++++++++++++++--------------------
 1 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 6e63db7..e91c15c 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -34,13 +34,13 @@ static DEFINE_MUTEX(allocated_ptys_lock);
 
 static struct vfsmount *devpts_mnt;
 
-static struct {
+struct pts_mount_opts {
 	int setuid;
 	int setgid;
 	uid_t   uid;
 	gid_t   gid;
 	umode_t mode;
-} config = {.mode = DEVPTS_DEFAULT_MODE};
+};
 
 enum {
 	Opt_uid, Opt_gid, Opt_mode,
@@ -56,6 +56,7 @@ static match_table_t tokens = {
 
 struct pts_fs_info {
 	struct ida allocated_ptys;
+	struct pts_mount_opts mount_opts;
 };
 
 static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
@@ -74,12 +75,14 @@ static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 static int devpts_remount(struct super_block *sb, int *flags, char *data)
 {
 	char *p;
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
+	struct pts_mount_opts *opts = &fsi->mount_opts;
 
-	config.setuid  = 0;
-	config.setgid  = 0;
-	config.uid     = 0;
-	config.gid     = 0;
-	config.mode    = DEVPTS_DEFAULT_MODE;
+	opts->setuid  = 0;
+	opts->setgid  = 0;
+	opts->uid     = 0;
+	opts->gid     = 0;
+	opts->mode    = DEVPTS_DEFAULT_MODE;
 
 	while ((p = strsep(&data, ",")) != NULL) {
 		substring_t args[MAX_OPT_ARGS];
@@ -94,19 +97,19 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
 		case Opt_uid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			config.uid = option;
-			config.setuid = 1;
+			opts->uid = option;
+			opts->setuid = 1;
 			break;
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			config.gid = option;
-			config.setgid = 1;
+			opts->gid = option;
+			opts->setgid = 1;
 			break;
 		case Opt_mode:
 			if (match_octal(&args[0], &option))
 				return -EINVAL;
-			config.mode = option & S_IALLUGO;
+			opts->mode = option & S_IALLUGO;
 			break;
 		default:
 			printk(KERN_ERR "devpts: called with bogus options\n");
@@ -119,11 +122,14 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
 
 static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs)
 {
-	if (config.setuid)
-		seq_printf(seq, ",uid=%u", config.uid);
-	if (config.setgid)
-		seq_printf(seq, ",gid=%u", config.gid);
-	seq_printf(seq, ",mode=%03o", config.mode);
+	struct pts_fs_info *fsi = DEVPTS_SB(vfs->mnt_sb);
+	struct pts_mount_opts *opts = &fsi->mount_opts;
+
+	if (opts->setuid)
+		seq_printf(seq, ",uid=%u", opts->uid);
+	if (opts->setgid)
+		seq_printf(seq, ",gid=%u", opts->gid);
+	seq_printf(seq, ",mode=%03o", opts->mode);
 
 	return 0;
 }
@@ -143,6 +149,7 @@ static void *new_pts_fs_info(void)
 		return NULL;
 
 	ida_init(&fsi->allocated_ptys);
+	fsi->mount_opts.mode = DEVPTS_DEFAULT_MODE;
 
 	return fsi;
 }
@@ -262,6 +269,8 @@ int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty)
 	struct super_block *sb = pts_sb_from_inode(ptmx_inode);
 	struct inode *inode = new_inode(sb);
 	struct dentry *root = sb->s_root;
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
+	struct pts_mount_opts *opts = &fsi->mount_opts;
 	char s[12];
 
 	/* We're supposed to be given the slave end of a pty */
@@ -272,10 +281,10 @@ int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty)
 		return -ENOMEM;
 
 	inode->i_ino = number+2;
-	inode->i_uid = config.setuid ? config.uid : current->fsuid;
-	inode->i_gid = config.setgid ? config.gid : current->fsgid;
+	inode->i_uid = opts->setuid ? opts->uid : current->fsuid;
+	inode->i_gid = opts->setgid ? opts->gid : current->fsgid;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
-	init_special_inode(inode, S_IFCHR|config.mode, device);
+	init_special_inode(inode, S_IFCHR|opts->mode, device);
 	inode->i_private = tty;
 	tty->driver_data = inode;
 
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/9] Extract option parsing to new function
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2008-10-15  5:34   ` [PATCH 3/9] Per-mount 'config' object sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:35   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:35   ` [PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:35 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From c4e1a348c2424ce503c24c8a56fa91015d9ee194 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 4/9] Extract option parsing to new function

Move code to parse mount options into a separate function so it can
(later) be shared between mount and remount operations.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 fs/devpts/inode.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index e91c15c..7ae60aa 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -72,11 +72,9 @@ static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 	return devpts_mnt->mnt_sb;
 }
 
-static int devpts_remount(struct super_block *sb, int *flags, char *data)
+static int parse_mount_options(char *data, struct pts_mount_opts *opts)
 {
 	char *p;
-	struct pts_fs_info *fsi = DEVPTS_SB(sb);
-	struct pts_mount_opts *opts = &fsi->mount_opts;
 
 	opts->setuid  = 0;
 	opts->setgid  = 0;
@@ -120,6 +118,14 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
 	return 0;
 }
 
+static int devpts_remount(struct super_block *sb, int *flags, char *data)
+{
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
+	struct pts_mount_opts *opts = &fsi->mount_opts;
+
+	return parse_mount_options(data, opts);
+}
+
 static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs)
 {
 	struct pts_fs_info *fsi = DEVPTS_SB(vfs->mnt_sb);
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2008-10-15  5:35   ` [PATCH 4/9] Extract option parsing to new function sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:35   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:36   ` [PATCH 6/9] Define mknod_ptmx() sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:35 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From 18a468a2f2db8f055bf62882d44d40764e924f3b Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 drivers/char/Kconfig |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 700ff96..0d3ea89 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -443,6 +443,17 @@ config UNIX98_PTYS
 	  All modern Linux systems use the Unix98 ptys.  Say Y unless
 	  you're on an embedded system and want to conserve memory.
 
+config DEVPTS_MULTIPLE_INSTANCES
+	bool "Support multiple instances of devpts"
+	depends on UNIX98_PTYS
+	default n
+	---help---
+	  Enable support for multiple instances of devpts filesystem.
+	  If you want to have isolated PTY namespaces (eg: in containers),
+	  say Y here.  Otherwise, say N. If enabled, each mount of devpts
+	  filesystem with the '-o newinstance' option will create an
+	  independent PTY namespace.
+
 config LEGACY_PTYS
 	bool "Legacy (BSD) PTY support"
 	default y
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/9] Define mknod_ptmx()
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2008-10-15  5:35   ` [PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:36   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:37   ` [PATCH 7/9] Define get_init_pts_sb() sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:36 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From ff0c06fb1878a3c20a6a91d9666d44f56eb94ff7 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 6/9] Define mknod_ptmx()

/dev/ptmx is closely tied to the devpts filesystem. An open of /dev/ptmx,
allocates the next pty index and the associated device shows up in the
devpts fs as /dev/pts/n.

Wih multiple instancs of devpts filesystem, during an open of /dev/ptmx
we would be unable to determine which instance of the devpts is being
accessed.

So we move the 'ptmx' node into /dev/pts and use the inode of the 'ptmx'
node to identify the superblock and hence the devpts instance.  This patch
adds ability for the kernel to internally create the [ptmx, c, 5:2] device
when mounting devpts filesystem.  Since the ptmx node in devpts is new and
may surprise some userspace scripts, the default permissions for the new
node is 0000.  These permissions can be changed either using chmod or by
remounting with the new '-o ptmxmode=0666' mount option.

Changelog[v5]:
	- [Serge Hallyn bugfix]: Letting new_inode() assign inode number to
	  ptmx can collide with hand-assigning inode numbers to ptys. So,
	  hand-assign specific inode number to ptmx node also.
	- [Serge Hallyn]: Maybe safer to grab root dentry mutex while creating
	  ptmx node
	- [Bugfix with Serge Hallyn] Replace lookup_one_len() in mknod_ptmx()
	  wih d_alloc_name() (lookup during ->get_sb() locks up system). To
	  simplify patchset, fold the ptmx_dentry patch into this.

Changelog[v4]:
	- Change default permissions of pts/ptmx node to 0000.
	- Move code for ptmxmode under #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES.

Changelog[v3]:
	- Rename ptmx_mode to ptmxmode (for consistency with 'newinstance')

Changelog[v2]:
	- [H. Peter Anvin] Remove mknod() system call support and create the
	  ptmx node internally.

Changelog[v1]:
	- Earlier version of this patch enabled creating /dev/pts/tty as
	  well. As pointed out by Al Viro and H. Peter Anvin, that is not
	  really necessary.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/devpts/inode.c |  113 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 7ae60aa..bbdd7df 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -27,6 +27,13 @@
 #define DEVPTS_SUPER_MAGIC 0x1cd1
 
 #define DEVPTS_DEFAULT_MODE 0600
+/*
+ * ptmx is a new node in /dev/pts and will be unused in legacy (single-
+ * instance) mode. To prevent surprises in user space, set permissions of
+ * ptmx to 0. Use 'chmod' or remount with '-o ptmxmode' to set meaningful
+ * permissions.
+ */
+#define DEVPTS_DEFAULT_PTMX_MODE 0000
 #define PTMX_MINOR	2
 
 extern int pty_limit;			/* Config limit on Unix98 ptys */
@@ -40,10 +47,11 @@ struct pts_mount_opts {
 	uid_t   uid;
 	gid_t   gid;
 	umode_t mode;
+	umode_t ptmxmode;
 };
 
 enum {
-	Opt_uid, Opt_gid, Opt_mode,
+	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode,
 	Opt_err
 };
 
@@ -51,12 +59,16 @@ static match_table_t tokens = {
 	{Opt_uid, "uid=%u"},
 	{Opt_gid, "gid=%u"},
 	{Opt_mode, "mode=%o"},
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
+	{Opt_ptmxmode, "ptmxmode=%o"},
+#endif
 	{Opt_err, NULL}
 };
 
 struct pts_fs_info {
 	struct ida allocated_ptys;
 	struct pts_mount_opts mount_opts;
+	struct dentry *ptmx_dentry;
 };
 
 static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
@@ -81,6 +93,7 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
 	opts->uid     = 0;
 	opts->gid     = 0;
 	opts->mode    = DEVPTS_DEFAULT_MODE;
+	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
 
 	while ((p = strsep(&data, ",")) != NULL) {
 		substring_t args[MAX_OPT_ARGS];
@@ -109,6 +122,13 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
 				return -EINVAL;
 			opts->mode = option & S_IALLUGO;
 			break;
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
+		case Opt_ptmxmode:
+			if (match_octal(&args[0], &option))
+				return -EINVAL;
+			opts->ptmxmode = option & S_IALLUGO;
+			break;
+#endif
 		default:
 			printk(KERN_ERR "devpts: called with bogus options\n");
 			return -EINVAL;
@@ -118,12 +138,93 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
 	return 0;
 }
 
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
+static int mknod_ptmx(struct super_block *sb)
+{
+	int mode;
+	int rc = -ENOMEM;
+	struct dentry *dentry;
+	struct inode *inode;
+	struct dentry *root = sb->s_root;
+	struct pts_fs_info *fsi = DEVPTS_SB(sb);
+	struct pts_mount_opts *opts = &fsi->mount_opts;
+
+	mutex_lock(&root->d_inode->i_mutex);
+
+	/* If we have already created ptmx node, return */
+	if (fsi->ptmx_dentry) {
+		rc = 0;
+		goto out;
+	}
+
+	dentry = d_alloc_name(root, "ptmx");
+	if (!dentry) {
+		printk(KERN_NOTICE "Unable to alloc dentry for ptmx node\n");
+		goto out;
+	}
+
+	/*
+	 * Create a new 'ptmx' node in this mount of devpts.
+	 */
+	inode = new_inode(sb);
+	if (!inode) {
+		printk(KERN_ERR "Unable to alloc inode for ptmx node\n");
+		dput(dentry);
+		goto out;
+	}
+
+	inode->i_ino = 2;
+	inode->i_uid = inode->i_gid = 0;
+	inode->i_blocks = 0;
+	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
+
+	mode = S_IFCHR|opts->ptmxmode;
+	init_special_inode(inode, mode, MKDEV(TTYAUX_MAJOR, 2));
+
+	d_add(dentry, inode);
+	
+	fsi->ptmx_dentry = dentry;
+	rc = 0;
+
+	printk(KERN_DEBUG "Created ptmx node in devpts ino %lu\n",
+			inode->i_ino);
+out:
+	mutex_unlock(&root->d_inode->i_mutex);
+	return rc;
+}
+
+static void update_ptmx_mode(struct pts_fs_info *fsi)
+{
+       struct inode *inode;
+       if (fsi->ptmx_dentry) {
+               inode = fsi->ptmx_dentry->d_inode;
+               inode->i_mode = S_IFCHR|fsi->mount_opts.ptmxmode;
+       }
+}
+#else
+static inline void update_ptmx_mode(struct pts_fs_info *fsi)
+{
+       return;
+}
+#endif
+
 static int devpts_remount(struct super_block *sb, int *flags, char *data)
 {
+	int err;
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
 	struct pts_mount_opts *opts = &fsi->mount_opts;
 
-	return parse_mount_options(data, opts);
+	err = parse_mount_options(data, opts);
+
+	/*
+	 * parse_mount_options() restores options to default values
+	 * before parsing and may have changed ptmxmode. So, update the
+	 * mode in the inode too. Bogus options don't fail the remount,
+	 * so do this even on error return.
+	 */
+	update_ptmx_mode(fsi);
+
+	return err;
 }
 
 static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs)
@@ -136,6 +237,9 @@ static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	if (opts->setgid)
 		seq_printf(seq, ",gid=%u", opts->gid);
 	seq_printf(seq, ",mode=%03o", opts->mode);
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
+	seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
+#endif
 
 	return 0;
 }
@@ -156,6 +260,7 @@ static void *new_pts_fs_info(void)
 
 	ida_init(&fsi->allocated_ptys);
 	fsi->mount_opts.mode = DEVPTS_DEFAULT_MODE;
+	fsi->mount_opts.ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
 
 	return fsi;
 }
@@ -211,7 +316,7 @@ static void devpts_kill_sb(struct super_block *sb)
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
 
 	kfree(fsi);
-	kill_anon_super(sb);
+	kill_litter_super(sb);
 }
 
 static struct file_system_type devpts_fs_type = {
@@ -286,7 +391,7 @@ int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty)
 	if (!inode)
 		return -ENOMEM;
 
-	inode->i_ino = number+2;
+	inode->i_ino = number+3;
 	inode->i_uid = opts->setuid ? opts->uid : current->fsuid;
 	inode->i_gid = opts->setgid ? opts->gid : current->fsgid;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/9] Define get_init_pts_sb()
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2008-10-15  5:36   ` [PATCH 6/9] Define mknod_ptmx() sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:37   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:37   ` [PATCH 8/9] Enable multiple instances of devpts sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:37 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From 3a2b7147d5aa345ab96d321ffefd326cbc43e24d Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 7/9] Define get_init_pts_sb()

See comments in the function header for details. The new interface will
be used in a follow-on patch.

Changelog [v2]:
	[Dave Hansen] Replace get_sb_ref() in fs/super.c with get_init_pts_sb()
	and make the new interface private to devpts

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/devpts/inode.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index bbdd7df..1dfdbf0 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -305,10 +305,63 @@ fail:
 	return -ENOMEM;
 }
 
+static int compare_init_pts_sb(struct super_block *s, void *p)
+{
+	if (devpts_mnt)
+		return devpts_mnt->mnt_sb == s;
+
+	return 0;
+}
+
+/*
+ * get_init_pts_sb()
+ *
+ *     This interface is needed to support multiple namespace semantics in
+ *     devpts while preserving backward compatibility of the current 'single-
+ *     namespace' semantics. i.e all mounts of devpts without the 'newinstance'
+ *     mount option should bind to the initial kernel mount, like
+ *     get_sb_single().
+ *
+ *     Mounts with 'newinstance' option create a new private namespace.
+ *
+ *     But for single-mount semantics, devpts cannot use get_sb_single(),
+ *     because get_sb_single()/sget() find and use the super-block from
+ *     the most recent mount of devpts. But that recent mount may be a
+ *     'newinstance' mount and get_sb_single() would pick the newinstance
+ *     super-block instead of the initial super-block.
+ *
+ *     This interface is identical to get_sb_single() except that it
+ *     consistently selects the 'single-namespace' superblock even in the
+ *     presence of the private namespace (i.e 'newinstance') super-blocks.
+ */
+static int get_init_pts_sb(struct file_system_type *fs_type, int flags,
+		void *data, struct vfsmount *mnt)
+{
+        struct super_block *s;
+        int error;
+
+        s = sget(fs_type, compare_init_pts_sb, set_anon_super, NULL);
+        if (IS_ERR(s))
+                return PTR_ERR(s);
+
+        if (!s->s_root) {
+                s->s_flags = flags;
+                error = devpts_fill_super(s, data, flags & MS_SILENT ? 1 : 0);
+                if (error) {
+                        up_write(&s->s_umount);
+                        deactivate_super(s);
+                        return error;
+                }
+                s->s_flags |= MS_ACTIVE;
+        }
+        do_remount_sb(s, flags, data, 0);
+        return simple_set_mnt(mnt, s);
+}
+
 static int devpts_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt);
+	return get_init_pts_sb(fs_type, flags, data, mnt);
 }
 
 static void devpts_kill_sb(struct super_block *sb)
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/9] Enable multiple instances of devpts
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2008-10-15  5:37   ` [PATCH 7/9] Define get_init_pts_sb() sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:37   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  2008-10-15  5:38   ` [PATCH 9/9] Document usage of multiple-instances " sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:37 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From 80380a560dfe89dede7df33e9e4360653f9bda14 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 8/9] Enable multiple instances of devpts

To support containers, allow multiple instances of devpts filesystem, such
that indices of ptys allocated in one instance are independent of ptys
allocated in other instances of devpts.

But to preserve backward compatibility, enable this support for multiple
instances only if:

	- CONFIG_DEVPTS_MULTIPLE_INSTANCES is set to Y, and
	- '-o newinstance' mount option is specified while mounting devpts

To use multi-instance mount, a container startup script could:

	$ ns_exec -cm /bin/bash
	$ umount /dev/pts
	$ mount -t devpts -o newinstance lxcpts /dev/pts
	$ mount -o bind /dev/pts/ptmx /dev/ptmx
	$ /usr/sbin/sshd -p 1234

where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs
/bin/bash in the child process. A pty created by the sshd is not visible in
the original mount of /dev/pts.

USER-SPACE-IMPACT:
	- See Documentation/fs/devpts.txt (included in next patch) for user-
	  space impact in multi-instance and mixed-mode operation.
TODO:
	- Update mount(8), pts(4) man pages. Highlight impact of not
	  redirecting /dev/ptmx to /dev/pts/ptmx after a multi-instance mount.

Changelog[v6]:
	- [Dave Hansen] Use new get_init_pts_sb() interface
	- [Serge Hallyn] Don't bother displaying 'newinstance' in show_options
	- [Serge Hallyn] Use macros (PARSE_REMOUNT/PARSE_MOUNT) instead of 0/1.
	- [Serge Hallyn] Check error return from get_sb_single() (now
	  get_init_pts_sb())
	- devpts_pty_kill(): don't dput error dentries

Changelog[v5]:
	- Move get_sb_ref() definition to earlier patch
	- Move usage info to Documentation/filesystems/devpts.txt (next patch)
	- Make ptmx node even in init_pts_ns, now that default mode is 0000
	  (defined in earlier patch, enabled here).
	- Cache ptmx dentry and use to update mode during remount
	  (defined in earlier patch, enabled here).
	- Bugfix: explicitly ignore newinstance on remount (if newinstance was
	  specified on remount of initial mount, it would be ignored but
	  /proc/mounts would imply that the option was set)

Changelog[v4]:

	- Update patch description to address H. Peter Anvin's comments
	- Consolidate multi-instance mode code under new config token,
	  CONFIG_DEVPTS_MULTIPLE_INSTANCE.
	- Move usage-details from patch description to
	  Documentation/fs/devpts.txt

Changelog[v3]:
	- Rename new mount option to 'newinstance'
	- Create ptmx nodes only in 'newinstance' mounts
	- Bugfix: parse_mount_options() modifies @data but since we need to
	  parse the @data twice (once in devpts_get_sb() and once during
	  do_remount_sb()), parse a local copy of @data in devpts_get_sb().
	  (restructured code in devpts_get_sb() to fix this)

Changelog[v2]:
	- Support both single-mount and multiple-mount semantics and
	  provide '-onewmnt' option to select the semantics.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/devpts/inode.c |  170 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 163 insertions(+), 7 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 1dfdbf0..f9a9346 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -48,10 +48,11 @@ struct pts_mount_opts {
 	gid_t   gid;
 	umode_t mode;
 	umode_t ptmxmode;
+	int newinstance;
 };
 
 enum {
-	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode,
+	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode, Opt_newinstance,
 	Opt_err
 };
 
@@ -61,6 +62,7 @@ static match_table_t tokens = {
 	{Opt_mode, "mode=%o"},
 #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
 	{Opt_ptmxmode, "ptmxmode=%o"},
+	{Opt_newinstance, "newinstance"},
 #endif
 	{Opt_err, NULL}
 };
@@ -78,13 +80,17 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
 
 static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 {
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
 	if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC)
 		return inode->i_sb;
-
+#endif
 	return devpts_mnt->mnt_sb;
 }
 
-static int parse_mount_options(char *data, struct pts_mount_opts *opts)
+#define PARSE_MOUNT	0
+#define PARSE_REMOUNT	1
+
+static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 {
 	char *p;
 
@@ -95,6 +101,10 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
 	opts->mode    = DEVPTS_DEFAULT_MODE;
 	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
 
+	/* newinstance makes sense only on initial mount */
+	if (op == PARSE_MOUNT)
+		opts->newinstance = 0;
+
 	while ((p = strsep(&data, ",")) != NULL) {
 		substring_t args[MAX_OPT_ARGS];
 		int token;
@@ -128,6 +138,11 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
 				return -EINVAL;
 			opts->ptmxmode = option & S_IALLUGO;
 			break;
+		case Opt_newinstance:
+			/* newinstance makes sense only on initial mount */
+			if (op == PARSE_MOUNT)
+				opts->newinstance = 1;
+			break;
 #endif
 		default:
 			printk(KERN_ERR "devpts: called with bogus options\n");
@@ -214,7 +229,7 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
 	struct pts_mount_opts *opts = &fsi->mount_opts;
 
-	err = parse_mount_options(data, opts);
+	err = parse_mount_options(data, PARSE_REMOUNT, opts);
 
 	/*
 	 * parse_mount_options() restores options to default values
@@ -309,8 +324,100 @@ static int compare_init_pts_sb(struct super_block *s, void *p)
 {
 	if (devpts_mnt)
 		return devpts_mnt->mnt_sb == s;
+	return 0;
+}
+
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
+/*
+ * Safely parse the mount options in @data and update @opts.
+ *
+ * devpts ends up parsing options two times during mount, due to the
+ * two modes of operation it supports. The first parse occurs in
+ * devpts_get_sb() when determining the mode (single-instance or
+ * multi-instance mode). The second parse happens in devpts_remount()
+ * or new_pts_mount() depending on the mode.
+ *
+ * Parsing of options modifies the @data making subsequent parsing
+ * incorrect. So make a local copy of @data and parse it.
+ *
+ * Return: 0 On success, -errno on error
+ */
+static int safe_parse_mount_options(void *data, struct pts_mount_opts *opts)
+{
+	int rc;
+	void *datacp;
+
+	if (!data)
+		return 0;
+
+	/* Use kstrdup() ?  */
+	datacp = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!datacp)
+		return -ENOMEM;
+
+	memcpy(datacp, data, PAGE_SIZE);
+	rc = parse_mount_options((char *)datacp, PARSE_MOUNT, opts);
+	kfree(datacp);
+
+	return rc;
+}
+
+/*
+ * Mount a new (private) instance of devpts.  PTYs created in this
+ * instance are independent of the PTYs in other devpts instances.
+ */
+static int new_pts_mount(struct file_system_type *fs_type, int flags,
+		void *data, struct vfsmount *mnt)
+{
+	int err;
+	struct pts_fs_info *fsi;
+	struct pts_mount_opts *opts;
+
+	printk(KERN_NOTICE "devpts: newinstance mount\n");
+
+	err = get_sb_nodev(fs_type, flags, data, devpts_fill_super, mnt);
+	if (err)
+		return err;
+
+	fsi = DEVPTS_SB(mnt->mnt_sb);
+	opts = &fsi->mount_opts;
+
+	err = parse_mount_options(data, PARSE_MOUNT, opts);
+	if (err)
+		goto fail;
+
+	err = mknod_ptmx(mnt->mnt_sb);
+	if (err)
+		goto fail;
 
 	return 0;
+
+fail:
+	dput(mnt->mnt_sb->s_root);
+	deactivate_super(mnt->mnt_sb);
+	return err;
+}
+
+/*
+ * Check if 'newinstance' mount option was specified in @data.
+ *
+ * Return: -errno  	on error (eg: invalid mount options specified)
+ * 	 : 1 		if 'newinstance' mount option was specified
+ * 	 : 0 		if 'newinstance' mount option was NOT specified
+ */
+static int is_new_instance_mount(void *data)
+{
+	int rc;
+	struct pts_mount_opts opts;
+
+	if (!data)
+		return 0;
+
+	rc = safe_parse_mount_options(data, &opts);
+	if (!rc)
+		rc = opts.newinstance;
+
+	return rc;
 }
 
 /*
@@ -358,11 +465,54 @@ static int get_init_pts_sb(struct file_system_type *fs_type, int flags,
         return simple_set_mnt(mnt, s);
 }
 
+/*
+ * Mount or remount the initial kernel mount of devpts. This type of
+ * mount maintains the legacy, single-instance semantics, while the
+ * kernel still allows multiple-instances.
+ */
+static int init_pts_mount(struct file_system_type *fs_type, int flags,
+		void *data, struct vfsmount *mnt)
+{
+	int err;
+
+	err = get_init_pts_sb(fs_type, flags, data, mnt);
+	if (err)
+		 return err;
+
+	err = mknod_ptmx(mnt->mnt_sb);
+	if (err) {
+		dput(mnt->mnt_sb->s_root);
+		deactivate_super(mnt->mnt_sb);
+	}
+
+	return err;
+}
+
 static int devpts_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_init_pts_sb(fs_type, flags, data, mnt);
+	int new;
+
+	new = is_new_instance_mount(data);
+	if (new < 0)
+		return new;
+
+	if (new)
+		return new_pts_mount(fs_type, flags, data, mnt);
+
+	return init_pts_mount(fs_type, flags, data, mnt);
 }
+#else
+/*
+ * This supports only the legacy single-instance semantics (no
+ * multiple-instance semantics)
+ */
+static int devpts_get_sb(struct file_system_type *fs_type, int flags,
+		const char *dev_name, void *data, struct vfsmount *mnt)
+{
+	return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt);
+}
+#endif
 
 static void devpts_kill_sb(struct super_block *sb)
 {
@@ -488,12 +638,18 @@ void devpts_pty_kill(struct tty_struct *tty)
 	mutex_lock(&root->d_inode->i_mutex);
 
 	dentry = d_find_alias(inode);
-	if (dentry && !IS_ERR(dentry)) {
+	if (IS_ERR(dentry))
+		goto out;
+
+	if (dentry) {
 		inode->i_nlink--;
 		d_delete(dentry);
-		dput(dentry);
+		dput(dentry);	// d_alloc_name() in devpts_pty_new()
 	}
 
+	dput(dentry);		// d_find_alias above
+
+out:
 	mutex_unlock(&root->d_inode->i_mutex);
 }
 
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/9] Document usage of multiple-instances of devpts
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (7 preceding siblings ...)
  2008-10-15  5:37   ` [PATCH 8/9] Enable multiple instances of devpts sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-15  5:38   ` sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
       [not found]     ` <20081015053800.GI2215-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2008-10-16 15:19   ` [PATCH 0/9] Multiple devpts instances Serge E. Hallyn
  2009-02-19 15:43   ` Daniel Lezcano
  10 siblings, 1 reply; 29+ messages in thread
From: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 @ 2008-10-15  5:38 UTC (permalink / raw)
  To: hpa-YMNOUZJC4hwAvxtiuMwx3w, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	serue-r/Jw6+rmf7HQT0dZR+AlfA, David C. Hansen
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A


From c4596977ca34b9664d97efa8681e6711145a22cf Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: [PATCH 9/9] Document usage of multiple-instances of devpts

Changelog [v2]:
	- Add note indicating strict isolation is not possible unless all
	  mounts of devpts use the 'newinstance' mount option.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 Documentation/filesystems/devpts.txt |  132 ++++++++++++++++++++++++++++++++++
 1 files changed, 132 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/devpts.txt

diff --git a/Documentation/filesystems/devpts.txt b/Documentation/filesystems/devpts.txt
new file mode 100644
index 0000000..68dffd8
--- /dev/null
+++ b/Documentation/filesystems/devpts.txt
@@ -0,0 +1,132 @@
+
+To support containers, we now allow multiple instances of devpts filesystem,
+such that indices of ptys allocated in one instance are independent of indices
+allocated in other instances of devpts.
+
+To preserve backward compatibility, this support for multiple instances is
+enabled only if:
+
+	- CONFIG_DEVPTS_MULTIPLE_INSTANCES=y, and
+	- '-o newinstance' mount option is specified while mounting devpts
+
+IOW, devpts now supports both single-instance and multi-instance semantics.
+
+If CONFIG_DEVPTS_MULTIPLE_INSTANCES=n, there is no change in behavior and
+this referred to as the "legacy" mode. In this mode, the new mount options
+(-o newinstance and -o ptmxmode) will be ignored with a 'bogus option' message
+on console.
+
+If CONFIG_DEVPTS_MULTIPLE_INSTANCES=y and devpts is mounted without the
+'newinstance' option (as in current start-up scripts) the new mount binds
+to the initial kernel mount of devpts. This mode is referred to as the
+'single-instance' mode and the current, single-instance semantics are
+preserved, i.e PTYs are common across the system.
+
+The only difference between this single-instance mode and the legacy mode
+is the presence of new, '/dev/pts/ptmx' node with permissions 0000, which
+can safely be ignored.
+
+If CONFIG_DEVPTS_MULTIPLE_INSTANCES=y and 'newinstance' option is specified,
+the mount is considered to be in the multi-instance mode and a new instance
+of the devpts fs is created. Any ptys created in this instance are independent
+of ptys in other instances of devpts. Like in the single-instance mode, the
+/dev/pts/ptmx node is present. To effectively use the multi-instance mode,
+open of /dev/ptmx must be a redirected to '/dev/pts/ptmx' using a symlink or
+bind-mount.
+
+Eg: A container startup script could do the following:
+
+	$ chmod 0666 /dev/pts/ptmx
+	$ rm /dev/ptmx
+	$ ln -s pts/ptmx /dev/ptmx
+	$ ns_exec -cm /bin/bash
+
+	# We are now in new container
+
+	$ umount /dev/pts
+	$ mount -t devpts -o newinstance lxcpts /dev/pts
+	$ sshd -p 1234
+
+where 'ns_exec -cm /bin/bash' calls clone() with CLONE_NEWNS flag and execs
+/bin/bash in the child process.  A pty created by the sshd is not visible in
+the original mount of /dev/pts.
+
+User-space changes
+------------------
+
+In multi-instance mode (i.e '-o newinstance' mount option is specified at least
+once), following user-space issues should be noted.
+
+1. If -o newinstance mount option is never used, /dev/pts/ptmx can be ignored
+   and no change is needed to system-startup scripts.
+
+2. To effectively use multi-instance mode (i.e -o newinstance is specified)
+   administrators or startup scripts should "redirect" open of /dev/ptmx to
+   /dev/pts/ptmx using either a bind mount or symlink.
+
+	$ mount -t devpts -o newinstance devpts /dev/pts
+
+   followed by either
+
+	$ rm /dev/ptmx
+	$ ln -s pts/ptmx /dev/ptmx
+	$ chmod 666 /dev/pts/ptmx
+   or
+	$ mount -o bind /dev/pts/ptmx /dev/ptmx
+
+3. The '/dev/ptmx -> pts/ptmx' symlink is the preferred method since it
+   enables better error-reporting and treats both single-instance and
+   multi-instance mounts similarly.
+
+   But this method requires that system-startup scripts set the mode of
+   /dev/pts/ptmx correctly (default mode is 0000). The scripts can set the
+   mode by, either
+
+   	- adding ptmxmode mount option to devpts entry in /etc/fstab, or
+	- using 'chmod 0666 /dev/pts/ptmx'
+
+4. If multi-instance mode mount is needed for containers, but the system
+   startup scripts have not yet been updated, container-startup scripts
+   should bind mount /dev/ptmx to /dev/pts/ptmx to avoid breaking single-
+   instance mounts.
+
+   Or, in general, container-startup scripts should use:
+
+	mount -t devpts -o newinstance -o ptmxmode=0666 devpts /dev/pts
+	if [ ! -L /dev/ptmx ]; then
+		mount -o bind /dev/pts/ptmx /dev/ptmx
+	fi
+
+   When all devpts mounts are multi-instance, /dev/ptmx can permanently be
+   a symlink to pts/ptmx and the bind mount can be ignored.
+
+5. A multi-instance mount that is not accompanied by the /dev/ptmx to
+   /dev/pts/ptmx redirection would result in an unusable/unreachable pty.
+
+	mount -t devpts -o newinstance lxcpts /dev/pts
+
+   immediately followed by:
+
+	open("/dev/ptmx")
+
+    would create a pty, say /dev/pts/7, in the initial kernel mount.
+    But /dev/pts/7 would be invisible in the new mount.
+
+6. The permissions for /dev/pts/ptmx node should be specified when mounting
+   /dev/pts, using the '-o ptmxmode=%o' mount option (default is 0000).
+
+	mount -t devpts -o newinstance -o ptmxmode=0644 devpts /dev/pts
+
+   The permissions can be later be changed as usual with 'chmod'.
+
+	chmod 666 /dev/pts/ptmx
+
+7. A mount of devpts without the 'newinstance' option results in binding to
+   initial kernel mount.  This behavior while preserving legacy semantics,
+   does not provide strict isolation in a container environment. i.e by
+   mounting devpts without the 'newinstance' option, a container could
+   get visibility into the 'host' or root container's devpts.
+   
+   To workaround this and have strict isolation, all mounts of devpts,
+   including the mount in the root container, should use the newinstance
+   option.
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] Document usage of multiple-instances of devpts
       [not found]     ` <20081015053800.GI2215-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-10-15 18:57       ` Serge E. Hallyn
       [not found]         ` <20081015185722.GA30005-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Serge E. Hallyn @ 2008-10-15 18:57 UTC (permalink / raw)
  To: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

Quoting sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org (sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org):
> 
> >From c4596977ca34b9664d97efa8681e6711145a22cf Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Subject: [PATCH 9/9] Document usage of multiple-instances of devpts
> 
> Changelog [v2]:
> 	- Add note indicating strict isolation is not possible unless all
> 	  mounts of devpts use the 'newinstance' mount option.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Looks good.  In the very last part, you might say just a little more to
make sure it's clear:  You want to mount -o newinstance before sshd
or gnome is started in the root container, so that a child container
can't reach your devpts by doing a mount -t devpts without -o
newinstance.  It's not that it's not clear in what you write, it's
more that it's at the very end and brief, so I'm afraid it's not
attention-grabbing enough as is.

> ---
>  Documentation/filesystems/devpts.txt |  132 ++++++++++++++++++++++++++++++++++
>  1 files changed, 132 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/filesystems/devpts.txt
> 
> diff --git a/Documentation/filesystems/devpts.txt b/Documentation/filesystems/devpts.txt
> new file mode 100644
> index 0000000..68dffd8
> --- /dev/null
> +++ b/Documentation/filesystems/devpts.txt
> @@ -0,0 +1,132 @@
> +
> +To support containers, we now allow multiple instances of devpts filesystem,
> +such that indices of ptys allocated in one instance are independent of indices
> +allocated in other instances of devpts.
> +
> +To preserve backward compatibility, this support for multiple instances is
> +enabled only if:
> +
> +	- CONFIG_DEVPTS_MULTIPLE_INSTANCES=y, and
> +	- '-o newinstance' mount option is specified while mounting devpts
> +
> +IOW, devpts now supports both single-instance and multi-instance semantics.
> +
> +If CONFIG_DEVPTS_MULTIPLE_INSTANCES=n, there is no change in behavior and
> +this referred to as the "legacy" mode. In this mode, the new mount options
> +(-o newinstance and -o ptmxmode) will be ignored with a 'bogus option' message
> +on console.
> +
> +If CONFIG_DEVPTS_MULTIPLE_INSTANCES=y and devpts is mounted without the
> +'newinstance' option (as in current start-up scripts) the new mount binds
> +to the initial kernel mount of devpts. This mode is referred to as the
> +'single-instance' mode and the current, single-instance semantics are
> +preserved, i.e PTYs are common across the system.
> +
> +The only difference between this single-instance mode and the legacy mode
> +is the presence of new, '/dev/pts/ptmx' node with permissions 0000, which
> +can safely be ignored.
> +
> +If CONFIG_DEVPTS_MULTIPLE_INSTANCES=y and 'newinstance' option is specified,
> +the mount is considered to be in the multi-instance mode and a new instance
> +of the devpts fs is created. Any ptys created in this instance are independent
> +of ptys in other instances of devpts. Like in the single-instance mode, the
> +/dev/pts/ptmx node is present. To effectively use the multi-instance mode,
> +open of /dev/ptmx must be a redirected to '/dev/pts/ptmx' using a symlink or
> +bind-mount.
> +
> +Eg: A container startup script could do the following:
> +
> +	$ chmod 0666 /dev/pts/ptmx
> +	$ rm /dev/ptmx
> +	$ ln -s pts/ptmx /dev/ptmx
> +	$ ns_exec -cm /bin/bash
> +
> +	# We are now in new container
> +
> +	$ umount /dev/pts
> +	$ mount -t devpts -o newinstance lxcpts /dev/pts
> +	$ sshd -p 1234
> +
> +where 'ns_exec -cm /bin/bash' calls clone() with CLONE_NEWNS flag and execs
> +/bin/bash in the child process.  A pty created by the sshd is not visible in
> +the original mount of /dev/pts.
> +
> +User-space changes
> +------------------
> +
> +In multi-instance mode (i.e '-o newinstance' mount option is specified at least
> +once), following user-space issues should be noted.
> +
> +1. If -o newinstance mount option is never used, /dev/pts/ptmx can be ignored
> +   and no change is needed to system-startup scripts.
> +
> +2. To effectively use multi-instance mode (i.e -o newinstance is specified)
> +   administrators or startup scripts should "redirect" open of /dev/ptmx to
> +   /dev/pts/ptmx using either a bind mount or symlink.
> +
> +	$ mount -t devpts -o newinstance devpts /dev/pts
> +
> +   followed by either
> +
> +	$ rm /dev/ptmx
> +	$ ln -s pts/ptmx /dev/ptmx
> +	$ chmod 666 /dev/pts/ptmx
> +   or
> +	$ mount -o bind /dev/pts/ptmx /dev/ptmx
> +
> +3. The '/dev/ptmx -> pts/ptmx' symlink is the preferred method since it
> +   enables better error-reporting and treats both single-instance and
> +   multi-instance mounts similarly.
> +
> +   But this method requires that system-startup scripts set the mode of
> +   /dev/pts/ptmx correctly (default mode is 0000). The scripts can set the
> +   mode by, either
> +
> +   	- adding ptmxmode mount option to devpts entry in /etc/fstab, or
> +	- using 'chmod 0666 /dev/pts/ptmx'
> +
> +4. If multi-instance mode mount is needed for containers, but the system
> +   startup scripts have not yet been updated, container-startup scripts
> +   should bind mount /dev/ptmx to /dev/pts/ptmx to avoid breaking single-
> +   instance mounts.
> +
> +   Or, in general, container-startup scripts should use:
> +
> +	mount -t devpts -o newinstance -o ptmxmode=0666 devpts /dev/pts
> +	if [ ! -L /dev/ptmx ]; then
> +		mount -o bind /dev/pts/ptmx /dev/ptmx
> +	fi
> +
> +   When all devpts mounts are multi-instance, /dev/ptmx can permanently be
> +   a symlink to pts/ptmx and the bind mount can be ignored.
> +
> +5. A multi-instance mount that is not accompanied by the /dev/ptmx to
> +   /dev/pts/ptmx redirection would result in an unusable/unreachable pty.
> +
> +	mount -t devpts -o newinstance lxcpts /dev/pts
> +
> +   immediately followed by:
> +
> +	open("/dev/ptmx")
> +
> +    would create a pty, say /dev/pts/7, in the initial kernel mount.
> +    But /dev/pts/7 would be invisible in the new mount.
> +
> +6. The permissions for /dev/pts/ptmx node should be specified when mounting
> +   /dev/pts, using the '-o ptmxmode=%o' mount option (default is 0000).
> +
> +	mount -t devpts -o newinstance -o ptmxmode=0644 devpts /dev/pts
> +
> +   The permissions can be later be changed as usual with 'chmod'.
> +
> +	chmod 666 /dev/pts/ptmx
> +
> +7. A mount of devpts without the 'newinstance' option results in binding to
> +   initial kernel mount.  This behavior while preserving legacy semantics,
> +   does not provide strict isolation in a container environment. i.e by
> +   mounting devpts without the 'newinstance' option, a container could
> +   get visibility into the 'host' or root container's devpts.
> +   
> +   To workaround this and have strict isolation, all mounts of devpts,
> +   including the mount in the root container, should use the newinstance
> +   option.
> -- 
> 1.5.2.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] Document usage of multiple-instances of devpts
       [not found]         ` <20081015185722.GA30005-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-10-15 19:03           ` H. Peter Anvin
       [not found]             ` <48F63E76.3030907-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2008-10-15 19:03 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, xemul-GEFAQzZX7r8dnm+yROfE0A,
	David C. Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

Serge E. Hallyn wrote:
> Looks good.  In the very last part, you might say just a little more to
> make sure it's clear:  You want to mount -o newinstance before sshd
> or gnome is started in the root container, so that a child container
> can't reach your devpts by doing a mount -t devpts without -o
> newinstance.  It's not that it's not clear in what you write, it's
> more that it's at the very end and brief, so I'm afraid it's not
> attention-grabbing enough as is.

Actually, you should just enable newinstance everywhere, in particular 
in your fstab, so that ALL instances of devpts in the system have 
newinstance (leaving the legacy one unreachable.)

In that sense I think your text above is more confusing than what 
Sukadev had.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] Document usage of multiple-instances of devpts
       [not found]             ` <48F63E76.3030907-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2008-10-15 19:48               ` Serge E. Hallyn
  0 siblings, 0 replies; 29+ messages in thread
From: Serge E. Hallyn @ 2008-10-15 19:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, xemul-GEFAQzZX7r8dnm+yROfE0A,
	David C. Hansen, bastian-yyjItF7Rl6lg9hUCZPvPmw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

Quoting H. Peter Anvin (hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org):
> Serge E. Hallyn wrote:
>> Looks good.  In the very last part, you might say just a little more to
>> make sure it's clear:  You want to mount -o newinstance before sshd
>> or gnome is started in the root container, so that a child container
>> can't reach your devpts by doing a mount -t devpts without -o
>> newinstance.  It's not that it's not clear in what you write, it's
>> more that it's at the very end and brief, so I'm afraid it's not
>> attention-grabbing enough as is.
>
> Actually, you should just enable newinstance everywhere, in particular  
> in your fstab, so that ALL instances of devpts in the system have  
> newinstance (leaving the legacy one unreachable.)
>
> In that sense I think your text above is more confusing than what  
> Sukadev had.
>
> 	-hpa

That's fine, I just want a clearer louder warning that without that, a
container is not isolated from your devpts.

Maybe just 'WARNING" above point 7?

Or just leave it.  You're right, his text is plenty clear.

-serge

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (8 preceding siblings ...)
  2008-10-15  5:38   ` [PATCH 9/9] Document usage of multiple-instances " sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
@ 2008-10-16 15:19   ` Serge E. Hallyn
  2009-02-19 15:43   ` Daniel Lezcano
  10 siblings, 0 replies; 29+ messages in thread
From: Serge E. Hallyn @ 2008-10-16 15:19 UTC (permalink / raw)
  To: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

Quoting sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org (sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org):
> 
> Enable multiple instances of devpts filesystem so each container can allocate
> ptys independently.

Hey Suka,

I was playing with this a bit last night, trying to make it oops, i.e.
create an sshd in a child ptsns ssh to it, and umount -l /dev/pts out
from under me.  Etc.  Ran ltp.

All seemed fine.

Tested-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

thanks,
-serge

> User interface:
> 
> 	Since supporting multiple mounts of devpts can break user-space, this
> 	feature is enabled only if:
> 
> 		- CONFIG_DEVPTS_MULTIPLE_INSTANCES=y (new config token), and
> 		- new mount option, -o newinstance is specified
> 
> 	If CONFIG_DEVPTS_MULTIPLE_INSTANCES=n, there should be no change in
> 	behavior.
> 
> 	See [PATCH 9/9] - Documentation/filesystems/devpts.txt for detailed
> 	usage/compatibility information.
> 
> 		[PATCH 1/9] Remove devpts_root global
> 		[PATCH 2/9] Per-mount allocated_ptys
> 		[PATCH 3/9] Per-mount 'config' object
> 		[PATCH 4/9] Extract option parsing to new function
> 		[PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token
> 		[PATCH 6/9] Define mknod_ptmx()
> 		[PATCH 7/9] Define get_init_pts_sb()
> 		[PATCH 8/9] Enable multiple instances of devpts
> 		[PATCH 9/9] Document usage of multiple-instances of devpts
> 	
> Implementation notes:
> 
> 	1. To enable multiple mounts of /dev/pts, (most) devpts interfaces
> 	   need to know which instance of devpts is being accessed. This
> 	   patchset uses the 'struct inode' or 'struct tty_struct' of the
> 	   device being accessed to identify the appropriate devpts instance.
> 	   Hence the need for the /dev/pts/ptmx bind-mount.
> 
> 	2. Mount options must be parsed twice during mount (once to determine
> 	   the mode of mount (single/multi-instance) and once to actually
> 	   save the options. There does not seem to be an easy way to parse
> 	   once and reuse (See 'safe_process_mount_opts()' in [PATCH 9/10])
> 
> 
> Changelog [v5]:
> 	- Merge all option-parsing (previously in patches 6,7) into patch 6.
> 	- Bugfixes (see changelog in patches 6 and 8)
> 	- Replace get_sb_ref() with devpts-specific get_init_pts_sb() (patch 7)
> 	- Minor update to devpts.txt documentation (patch 9)
> 
> 
> Changelog [v4]:
> 
> 	- Port to 2008-09-04 ttydev tree (and drop patches that were merged in)
> 	- Add DEVPTS_MULTIPLE_INSTANCES config token (patch 5) and move new
> 	  behavior under #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES. 
> 	- Create ptmx node even in initial mount 
> 	- Change default permissions of pts/ptmx node to 0000 (patch 6)
> 	- Cache ptmx dentry and use to update permissions of ptmx node on
> 	  remount so legacy mode can use the node with a simpler change to
> 	  /etc/fstab (patch 7)
> 	- Move get_sb_ref() helper code to separate patch (patch 8)
> 	- Add Documentation/filesystems/devpts.txt and describe usage info
> 	  in that file.
> 	  
> Changelog [v3]:
> 
> 	- Port to 2008-08-28 ttydev tree
> 	- Rename new mount options to 'ptmxmode' and 'newinstance'.
> 	- [Alan Cox] Use tty driver data to identify devpts (this is used to
> 	  cleanup get_node() in devpts_pty_kill()).
> 	- [H. Peter Anvin] get_node() cleanup in devpts (which was enabled by
> 	  the inode/tty parameter to devpts interfaces)
> 	- Bugfix in multi-mount mode (see Patch 11/11).
> 	- Executed pty tests in LTP (in both single-instance and multi-instance
> 	  mode)
> 	- Should be bisect-safe :-)
> 
> Changelog [v2]:
> 
> 	- New mount option '-o newmnt' added (patch 8/8)
> 	- Support both single-mount and multi-mount semantics (patch 8/8)
> 	- Automatically create ptmx node when devpts is mounted (patch 7/8)
> 	- Extract option parsing code to new function (patch 6/8)
> 	- Make 'config' params per-mount variables (patch 5/8)
> 	- Slightly re-ordered existing patches in set (patches 1/8, 2/8)
> 
> TODO:
> 	- Do we need a '-o ptmxuid' and '-o ptmxgid' options as well ?
> 	- Update mount(8) man page
> 	- (Sometime in future) Remove even initial kernel mount of devpts 
> 	- Any other good test suites to test this (besides LTP, sshd).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (9 preceding siblings ...)
  2008-10-16 15:19   ` [PATCH 0/9] Multiple devpts instances Serge E. Hallyn
@ 2009-02-19 15:43   ` Daniel Lezcano
       [not found]     ` <499D7E13.10601-GANU6spQydw@public.gmane.org>
  10 siblings, 1 reply; 29+ messages in thread
From: Daniel Lezcano @ 2009-02-19 15:43 UTC (permalink / raw)
  To: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, containers-qjLDD68F18O7TbgM5vRIOg,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org wrote:
> Enable multiple instances of devpts filesystem so each container can allocate
> ptys independently.
>   
Hi suka,

It looks like the /proc/sys/kernel/pty/max and nr are not virtualized.
Modifying in the container the "max" pty, that impacts the init_pty.
Same as nr which does not show the real number of pty allocated for the 
container.

Are you planning to fix this ?

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]     ` <499D7E13.10601-GANU6spQydw@public.gmane.org>
@ 2009-02-19 17:32       ` H. Peter Anvin
       [not found]         ` <499D97B1.1090902-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2009-02-19 17:32 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Daniel Lezcano wrote:
> sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org wrote:
>> Enable multiple instances of devpts filesystem so each container can
>> allocate
>> ptys independently.
>>   
> Hi suka,
> 
> It looks like the /proc/sys/kernel/pty/max and nr are not virtualized.
> Modifying in the container the "max" pty, that impacts the init_pty.
> Same as nr which does not show the real number of pty allocated for the
> container.
> 
> Are you planning to fix this ?
> 

That's a separate issue, i.e. a resource allocation
localization/globalization issue.  The main reason for these limits is
top put a cap on the amount of low kernel memory used on 32-bit systems
especially, which is somewhat inherently global.

Resource limit partitioning is a much bigger and orthogonal problem.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]         ` <499D97B1.1090902-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-02-19 18:09           ` Daniel Lezcano
       [not found]             ` <499DA069.3040603-GANU6spQydw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Lezcano @ 2009-02-19 18:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

H. Peter Anvin wrote:
> Daniel Lezcano wrote:
>   
>> sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org wrote:
>>     
>>> Enable multiple instances of devpts filesystem so each container can
>>> allocate
>>> ptys independently.
>>>   
>>>       
>> Hi suka,
>>
>> It looks like the /proc/sys/kernel/pty/max and nr are not virtualized.
>> Modifying in the container the "max" pty, that impacts the init_pty.
>> Same as nr which does not show the real number of pty allocated for the
>> container.
>>
>> Are you planning to fix this ?
>>
>>     
>
> That's a separate issue, i.e. a resource allocation
> localization/globalization issue.  The main reason for these limits is
> top put a cap on the amount of low kernel memory used on 32-bit systems
> especially, which is somewhat inherently global.
>
> Resource limit partitioning is a much bigger and orthogonal problem.
>   
In this case we don't have the pty allocated independently, no ?
I mean one container can allocate 4095 pty, making a pty starvation for 
others containers. Or imagine I am a vilain and I want to mess the other 
containers, I can do echo 0 > /proc/sys/kernel/pty/max.
AFAIR, we said people making isolation of a resource is in charge (if it 
is relevant), to take into account the /proc/sys part.

For example, making the network per namespace all the network 
configuration variable located in /proc/sys/net are per namespace too. 
When it is irrelevant the file is read-only or just not displayed.

IMHO, pty/max and pty/nr is part of the "multiple devpts instances" feature.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]             ` <499DA069.3040603-GANU6spQydw@public.gmane.org>
@ 2009-02-19 19:58               ` H. Peter Anvin
       [not found]                 ` <499DB9DA.2070301-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2009-02-19 19:58 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Daniel Lezcano wrote:
>>
>> Resource limit partitioning is a much bigger and orthogonal problem.
>>   
> In this case we don't have the pty allocated independently, no ?
> I mean one container can allocate 4095 pty, making a pty starvation for 
> others containers. Or imagine I am a vilain and I want to mess the other 
> containers, I can do echo 0 > /proc/sys/kernel/pty/max.
> AFAIR, we said people making isolation of a resource is in charge (if it 
> is relevant), to take into account the /proc/sys part.
> 
> For example, making the network per namespace all the network 
> configuration variable located in /proc/sys/net are per namespace too. 
> When it is irrelevant the file is read-only or just not displayed.
> 
> IMHO, pty/max and pty/nr is part of the "multiple devpts instances" 
> feature.
> 

Naming and resource partitioning are two orthogonal issues, regardless 
of what's IYHO.

Really.  You have the same classes of issues with ANY allocatable 
resource in the system.  Period.  Furthermore, there are quite a few 
applications which want one and not the other.  Trying to entangle them 
is broken.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                 ` <499DB9DA.2070301-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-02-19 22:28                   ` Eric W. Biederman
       [not found]                     ` <m1vdr6xdqv.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-02-19 22:42                   ` Daniel Lezcano
  1 sibling, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2009-02-19 22:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> writes:

> Daniel Lezcano wrote:
>>>
>>> Resource limit partitioning is a much bigger and orthogonal problem.
>>>   
>> In this case we don't have the pty allocated independently, no ?
>> I mean one container can allocate 4095 pty, making a pty starvation for others
>> containers. Or imagine I am a vilain and I want to mess the other containers,
>> I can do echo 0 > /proc/sys/kernel/pty/max.
>> AFAIR, we said people making isolation of a resource is in charge (if it is
>> relevant), to take into account the /proc/sys part.
>>
>> For example, making the network per namespace all the network configuration
>> variable located in /proc/sys/net are per namespace too. When it is irrelevant
>> the file is read-only or just not displayed.

Such as for global limits like /proc/sys/kernel/pty/max.
The design is a little different here because we do this at a filesystem level.


>> IMHO, pty/max and pty/nr is part of the "multiple devpts instances" feature.
>>
>
> Naming and resource partitioning are two orthogonal issues, regardless of what's
> IYHO.
>
> Really.  You have the same classes of issues with ANY allocatable
> resource in the system.  Period.  Furthermore, there are quite a few
> applications which want one and not the other.  Trying to entangle
> them is broken.

Peter they are entangled issues because the limits frequently show up
in the naming.  pids are a good example of that.

That said with the approach we have taken with ptys, is to move all of the
relevant files and state into the pty filesystem.  That is a very good
model for new development, especially because it does not require magic
systems calls or magic glue elsewhere in the kernel.  Using that model
if we are to add limits on the names generated by a particular devpts
instance those control files should likely live in devpts itself.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                 ` <499DB9DA.2070301-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  2009-02-19 22:28                   ` Eric W. Biederman
@ 2009-02-19 22:42                   ` Daniel Lezcano
       [not found]                     ` <499DE06E.4030108-GANU6spQydw@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Lezcano @ 2009-02-19 22:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

H. Peter Anvin wrote:
> Daniel Lezcano wrote:
>>>
>>> Resource limit partitioning is a much bigger and orthogonal problem.
>>>   
>> In this case we don't have the pty allocated independently, no ?
>> I mean one container can allocate 4095 pty, making a pty starvation 
>> for others containers. Or imagine I am a vilain and I want to mess 
>> the other containers, I can do echo 0 > /proc/sys/kernel/pty/max.
>> AFAIR, we said people making isolation of a resource is in charge (if 
>> it is relevant), to take into account the /proc/sys part.
>>
>> For example, making the network per namespace all the network 
>> configuration variable located in /proc/sys/net are per namespace 
>> too. When it is irrelevant the file is read-only or just not displayed.
>>
>> IMHO, pty/max and pty/nr is part of the "multiple devpts instances" 
>> feature.
>>
>
> Naming and resource partitioning are two orthogonal issues, regardless 
> of what's IYHO.
>
> Really.  You have the same classes of issues with ANY allocatable 
> resource in the system.  Period.  Furthermore, there are quite a few 
> applications which want one and not the other.  Trying to entangle 
> them is broken.
Mmh, perhaps there is a misunderstanding here.

The devpts new instance has been principally implemented for the 
container isolation. The container chroots to a private rootfs, does a 
new instance of devpts, mount it to /dev/pts and should remount /proc too.

The first implementation of the devpts was a namespace approach but 
finally it looks like it was not necessary to use a new clone flag 
because the mount namespace with the ability to do multiple instances of 
devpts was enough.

Each time we implemented a new namespace, we tried to take into account 
the /proc/sys part.
For example, you can modify /proc/sys/kernel/msgmax without impacting 
the other namespaces, you can modify /proc/sys/net/unix/max_dgram_qlen 
without having this configuration being propagated to other namespaces.

One other good example is the /proc/sys/net/ipv4/route/flush where the 
routes are flushed only for the current namespace. Why we did this ? 
because someone in another container can flush the routes of the other 
containers.

I agree we can found thousand of example of different resources which 
are not partitioned and if you refer to multiple instances of devpts in 
the same context, it is probably pointless to take into account the sysctl.

But if I am able to create a new instance of devpts for a container and 
modify the configuration of another devpts from this container, is it 
acceptable ? Can we convince people to use the containers for security 
and have anybody able to make a pty starvation from one container to 
another ?
If it is too much complicated to handle one value per new devpts 
instance, IMHO /proc/sys/kernel/pty/max should be, at least, read-only 
for the new instance, no ?

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                     ` <499DE06E.4030108-GANU6spQydw@public.gmane.org>
@ 2009-02-19 22:46                       ` H. Peter Anvin
  2009-02-19 23:59                       ` Eric W. Biederman
  1 sibling, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2009-02-19 22:46 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Daniel Lezcano wrote:
> 
> But if I am able to create a new instance of devpts for a container and 
> modify the configuration of another devpts from this container, is it 
> acceptable ? Can we convince people to use the containers for security 
> and have anybody able to make a pty starvation from one container to 
> another ?
> If it is too much complicated to handle one value per new devpts 
> instance, IMHO /proc/sys/kernel/pty/max should be, at least, read-only 
> for the new instance, no ?
> 

First of all, there is no such thing... the devpts instance is simply 
another filesystem, whereas the /proc/sys entry is a global limit on the 
total number of ptys in the system.  Again, one of thousands, and yes, 
they probably should ALL be readonly in a container environment.  That 
has to be set up separately than the devpts filesystem, because the 
devpts filesystem is not tied to procfs or even containers in any way.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                     ` <499DE06E.4030108-GANU6spQydw@public.gmane.org>
  2009-02-19 22:46                       ` H. Peter Anvin
@ 2009-02-19 23:59                       ` Eric W. Biederman
       [not found]                         ` <m1eixuvv00.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2009-02-19 23:59 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> writes:

> But if I am able to create a new instance of devpts for a container and modify
> the configuration of another devpts from this container, is it acceptable ? Can
> we convince people to use the containers for security and have anybody able to
> make a pty starvation from one container to another ?

I hardly how that is significant.  Anyone can allocate the rest of the possible
pty's today.  The situation does not get worse with devpts.

If you want security and permission arguments get with Serge and finish
the uid namespace.  The you will have a user that looks like root but
does not have permissions to do most things.

> If it is too much complicated to handle one value per new devpts instance, IMHO
> /proc/sys/kernel/pty/max should be, at least, read-only for the new instance, no?

No.  Either we add a pty_max value to the filesystem like we did with ptmx
or we forget it.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                     ` <m1vdr6xdqv.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-02-20  4:22                       ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2009-02-20  4:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Eric W. Biederman wrote:
>>
>> Really.  You have the same classes of issues with ANY allocatable
>> resource in the system.  Period.  Furthermore, there are quite a few
>> applications which want one and not the other.  Trying to entangle
>> them is broken.
> 
> Peter they are entangled issues because the limits frequently show up
> in the naming.  pids are a good example of that.
> 

No.  The *reason* for these limits are a matter of resource control, and
that has to at least have the ability to be global.

Entangling them because it peers through somewhat in the naming is
nonsense, at least in this case.

Containers may very well want resource control, but it's a separate
issue from naming.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                         ` <m1eixuvv00.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-02-23 20:56                           ` Serge E. Hallyn
       [not found]                             ` <20090223205609.GA32351-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Serge E. Hallyn @ 2009-02-23 20:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, H. Peter Anvin,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> writes:
> 
> > But if I am able to create a new instance of devpts for a container and modify
> > the configuration of another devpts from this container, is it acceptable ? Can
> > we convince people to use the containers for security and have anybody able to
> > make a pty starvation from one container to another ?
> 
> I hardly how that is significant.  Anyone can allocate the rest of the possible
> pty's today.  The situation does not get worse with devpts.
> 
> If you want security and permission arguments get with Serge and finish
> the uid namespace.  The you will have a user that looks like root but
> does not have permissions to do most things.

Right, and in particular the way it would partially solve this issue is
that the procsys limit file would be owned by root in the initial uid
namespace, so root in a child container would not be able to write to
it.

Defining a new mount option to set a per-sb limit seems useful though,
as I could easily see wanting to limit containers (on a 1000-container
system) to 3 ptys each for instance.

> > If it is too much complicated to handle one value per new devpts instance, IMHO
> > /proc/sys/kernel/pty/max should be, at least, read-only for the new instance, no?
> 
> No.  Either we add a pty_max value to the filesystem like we did with ptmx
> or we forget it.

-serge

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                             ` <20090223205609.GA32351-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-02-23 21:18                               ` H. Peter Anvin
       [not found]                                 ` <49A31299.8040501-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  2009-02-23 21:19                               ` Daniel Lezcano
  1 sibling, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2009-02-23 21:18 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Serge E. Hallyn wrote:
>>
>> If you want security and permission arguments get with Serge and finish
>> the uid namespace.  The you will have a user that looks like root but
>> does not have permissions to do most things.
> 
> Right, and in particular the way it would partially solve this issue is
> that the procsys limit file would be owned by root in the initial uid
> namespace, so root in a child container would not be able to write to
> it.
> 

No, uid namespace is not the right thing for this.  If anything, it 
should be controlled by a capability flag.  This is a general issue for 
procfs and sysfs as used for controlling any kind of system resources, 
though.

> Defining a new mount option to set a per-sb limit seems useful though,
> as I could easily see wanting to limit containers (on a 1000-container
> system) to 3 ptys each for instance.

What probably would make more sense is to limit containers to a specific 
number of inodes or open file descriptors.  The pty limit was a quick 
hack to avoid DoS, but it's really equivalent (with a small multiplier, 
~3 or so) to "open inodes".

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                             ` <20090223205609.GA32351-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-02-23 21:18                               ` H. Peter Anvin
@ 2009-02-23 21:19                               ` Daniel Lezcano
       [not found]                                 ` <49A312E6.9090900-GANU6spQydw@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Lezcano @ 2009-02-23 21:19 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman, H. Peter Anvin,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Serge E. Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>   
>> Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> writes:
>>
>>     
>>> But if I am able to create a new instance of devpts for a container and modify
>>> the configuration of another devpts from this container, is it acceptable ? Can
>>> we convince people to use the containers for security and have anybody able to
>>> make a pty starvation from one container to another ?
>>>       
>> I hardly how that is significant.  Anyone can allocate the rest of the possible
>> pty's today.  The situation does not get worse with devpts.
>>
>> If you want security and permission arguments get with Serge and finish
>> the uid namespace.  The you will have a user that looks like root but
>> does not have permissions to do most things.
>>     
>
> Right, and in particular the way it would partially solve this issue is
> that the procsys limit file would be owned by root in the initial uid
> namespace, so root in a child container would not be able to write to
> it.
>
> Defining a new mount option to set a per-sb limit seems useful though,
> as I could easily see wanting to limit containers (on a 1000-container
> system) to 3 ptys each for instance.
>   

Yep,  I changed my mind, I think Eric and HPA are right. devpts is a 
file system and not a namespace even if the result is the same. That 
makes sense to keep a global sysctl for the root container and handle 
security problem with user namespace and mount option.

>>> If it is too much complicated to handle one value per new devpts instance, IMHO
>>> /proc/sys/kernel/pty/max should be, at least, read-only for the new instance, no?
>>>       
>> No.  Either we add a pty_max value to the filesystem like we did with ptmx
>> or we forget it.
>>     
>
> -serge
>
>
>   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                                 ` <49A312E6.9090900-GANU6spQydw@public.gmane.org>
@ 2009-02-23 21:23                                   ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2009-02-23 21:23 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Daniel Lezcano wrote:
> 
> Yep,  I changed my mind, I think Eric and HPA are right. devpts is a 
> file system and not a namespace even if the result is the same. That 
> makes sense to keep a global sysctl for the root container and handle 
> security problem with user namespace and mount option.
> 

No, it's more dramatic than that.

Namespaces are not resource allocation boundaries, even though in the 
container use case you probably want both.

Furthermore, namespaces are relatively straightforward in comparison: 
you generally either want to share a namespace or you don't.  Resource 
control policies are much more complex.  In the general case you want to 
be able to support a hierarchial cascade of policies; at the least you 
want to have global and local limits.

Furthermore, there are a number of use cases for resource allocation 
boundaries that do *not* involve namespaces.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                                 ` <49A31299.8040501-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-02-23 22:27                                   ` Serge E. Hallyn
  2009-02-24  4:09                                   ` Eric W. Biederman
  1 sibling, 0 replies; 29+ messages in thread
From: Serge E. Hallyn @ 2009-02-23 22:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, Eric W. Biederman,
	containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

Quoting H. Peter Anvin (hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org):
> Serge E. Hallyn wrote:
>>>
>>> If you want security and permission arguments get with Serge and finish
>>> the uid namespace.  The you will have a user that looks like root but
>>> does not have permissions to do most things.
>>
>> Right, and in particular the way it would partially solve this issue is
>> that the procsys limit file would be owned by root in the initial uid
>> namespace, so root in a child container would not be able to write to
>> it.
>>
>
> No, uid namespace is not the right thing for this.  If anything, it  

For what?  uid ns is right for file access controls among namespaces,
and I'm just detailing what the uid ns file controls will buy you in
terms of the procsys limits file...

I actually do prefer having the file write handler check for
CAP_SYS_RESOURCE, but have conceded that battle at least in the
cgroup arena.

> should be controlled by a capability flag.  This is a general issue for  
> procfs and sysfs as used for controlling any kind of system resources,  
> though.
>
>> Defining a new mount option to set a per-sb limit seems useful though,
>> as I could easily see wanting to limit containers (on a 1000-container
>> system) to 3 ptys each for instance.
>
> What probably would make more sense is to limit containers to a specific  
> number of inodes or open file descriptors.  The pty limit was a quick  
> hack to avoid DoS, but it's really equivalent (with a small multiplier,  
> ~3 or so) to "open inodes".

Yes that (# inodes) sounds good.

thanks,
-serge

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Multiple devpts instances
       [not found]                                 ` <49A31299.8040501-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  2009-02-23 22:27                                   ` Serge E. Hallyn
@ 2009-02-24  4:09                                   ` Eric W. Biederman
  1 sibling, 0 replies; 29+ messages in thread
From: Eric W. Biederman @ 2009-02-24  4:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kyle-hoO6YkzgTuCM0SS3m2neIg, David C. Hansen,
	bastian-yyjItF7Rl6lg9hUCZPvPmw, containers-qjLDD68F18O7TbgM5vRIOg,
	sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	xemul-GEFAQzZX7r8dnm+yROfE0A

"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> writes:

> Serge E. Hallyn wrote:
>>>
>>> If you want security and permission arguments get with Serge and finish
>>> the uid namespace.  The you will have a user that looks like root but
>>> does not have permissions to do most things.
>>
>> Right, and in particular the way it would partially solve this issue is
>> that the procsys limit file would be owned by root in the initial uid
>> namespace, so root in a child container would not be able to write to
>> it.
>>
>
> No, uid namespace is not the right thing for this.  If anything, it should be
> controlled by a capability flag.  This is a general issue for procfs and sysfs
> as used for controlling any kind of system resources, though.

Yes, it is a general issue.

The design we have on the table is actually a security credentials
namespace not a uid namespace.  It encompasses uids, gids,
capabilities, and keys.

One of the biggest design constraints is that it should be possible to
create a credentials namespace without permissions and not have to worry
about confusing suid root executables.  Which means in that context a suid
root executable needs to mean an executable from the perspective an observer
outside the credential namespace does not have elevated privileges.

Once we get there, whether it is a capability based resource check or
a uid based resource check there will be no access from inside of a
credentials namespace, because no processes will have permissions to
access the resource.  Which is what was asked for.

That neatly solves the proc and sysfs issues as well as many others such
as access to /etc/passwd.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-02-24  4:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-15  5:30 [PATCH 0/9] Multiple devpts instances sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
     [not found] ` <20081015053000.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-15  5:33   ` [PATCH 1/9] Remove devpts_root global sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:33   ` [PATCH 2/9] Per-mount allocated_ptys sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:34   ` [PATCH 3/9] Per-mount 'config' object sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:35   ` [PATCH 4/9] Extract option parsing to new function sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:35   ` [PATCH 5/9] Add DEVPTS_MULTIPLE_INSTANCES config token sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:36   ` [PATCH 6/9] Define mknod_ptmx() sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:37   ` [PATCH 7/9] Define get_init_pts_sb() sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:37   ` [PATCH 8/9] Enable multiple instances of devpts sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2008-10-15  5:38   ` [PATCH 9/9] Document usage of multiple-instances " sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
     [not found]     ` <20081015053800.GI2215-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-15 18:57       ` Serge E. Hallyn
     [not found]         ` <20081015185722.GA30005-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-15 19:03           ` H. Peter Anvin
     [not found]             ` <48F63E76.3030907-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2008-10-15 19:48               ` Serge E. Hallyn
2008-10-16 15:19   ` [PATCH 0/9] Multiple devpts instances Serge E. Hallyn
2009-02-19 15:43   ` Daniel Lezcano
     [not found]     ` <499D7E13.10601-GANU6spQydw@public.gmane.org>
2009-02-19 17:32       ` H. Peter Anvin
     [not found]         ` <499D97B1.1090902-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-02-19 18:09           ` Daniel Lezcano
     [not found]             ` <499DA069.3040603-GANU6spQydw@public.gmane.org>
2009-02-19 19:58               ` H. Peter Anvin
     [not found]                 ` <499DB9DA.2070301-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-02-19 22:28                   ` Eric W. Biederman
     [not found]                     ` <m1vdr6xdqv.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-20  4:22                       ` H. Peter Anvin
2009-02-19 22:42                   ` Daniel Lezcano
     [not found]                     ` <499DE06E.4030108-GANU6spQydw@public.gmane.org>
2009-02-19 22:46                       ` H. Peter Anvin
2009-02-19 23:59                       ` Eric W. Biederman
     [not found]                         ` <m1eixuvv00.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-02-23 20:56                           ` Serge E. Hallyn
     [not found]                             ` <20090223205609.GA32351-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-02-23 21:18                               ` H. Peter Anvin
     [not found]                                 ` <49A31299.8040501-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-02-23 22:27                                   ` Serge E. Hallyn
2009-02-24  4:09                                   ` Eric W. Biederman
2009-02-23 21:19                               ` Daniel Lezcano
     [not found]                                 ` <49A312E6.9090900-GANU6spQydw@public.gmane.org>
2009-02-23 21:23                                   ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.