From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Serge E. Hallyn" Subject: Re: [PATCH 09/10] Enable multiple instances of devpts Date: Mon, 29 Sep 2008 10:29:51 -0500 Message-ID: <20080929152951.GA32518@us.ibm.com> References: <20080912174845.GA17350@us.ibm.com> <20080912175322.GJ17350@us.ibm.com> <20080924202616.GB31664@us.ibm.com> <20080926210347.GB31505@us.ibm.com> <20080929130131.GA12531@us.ibm.com> <20080929151828.GA10202@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20080929151828.GA10202-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org Cc: kyle-hoO6YkzgTuCM0SS3m2neIg@public.gmane.org, bastian-yyjItF7Rl6lg9hUCZPvPmw@public.gmane.org, ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org, hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org, containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org, xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org List-Id: containers.vger.kernel.org Quoting sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org (sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org): > Serge E. Hallyn [serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org] wrote: > | Quoting sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org (sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org): > | > | > @@ -232,6 +246,8 @@ static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs) > | > | > seq_printf(seq, ",mode=%03o", opts->mode); > | > | > #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES > | > | > seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode); > | > | > + if (opts->newinstance) > | > | > + seq_printf(seq, ",newinstance"); > | > | > | > | Is actually that something we want to show? It doesn't seem > | > | informative. > | > > | > Without this users have no easy way of knowing whether they have a > | > private mount specially if they mounted from command line ? > > You mean in a nested container ? I agree that it does not help then. > > | > | If they were in a container to begin with, then they still don't know. > | > | Now if you were to keep a unique per-instance id and have show_options > | list 'instance=%x', that would be helpful. Either that or just > | dropping the info altogether make sense. This 'newinstance' listing > | is meaningless. > | > > Another way to look at it is that it is a mount option that was specified > and we just report it. It may not be useful always but might help in some > cases. But I am fine either way. > > > > | > | > + > | > | > + err = mknod_ptmx(mnt->mnt_sb); > | > | > + if (err) { > | > | > + dput(mnt->mnt_sb->s_root); > | > | > + deactivate_super(mnt->mnt_sb); > | > | > + } else > | > | > + devpts_mnt = mnt; > | > | > + > | > | > + return err; > | > | > | > | There is no locking here, so in early-userspace two competing processes > | > | could both try to set devpts_mnt, right? > | > > | > Hmm. I was thinking there would be only one thread calling the > | > vfs_kern_mount() in init_devpts_fs. > | > | But what if init happens to (perhaps mistakenly) lead to 2 racing ones? > | > | Sure it's just a small memory leak, but why not just prevent it. > > Ok. > > | > | > | > | > | > + } > | > | > + > | > | > + return get_sb_ref(devpts_mnt->mnt_sb, flags, data, mnt); > | > | > +} > | > | > + > | > | > static int devpts_get_sb(struct file_system_type *fs_type, > | > | > int flags, const char *dev_name, void *data, struct vfsmount *mnt) > | > | > { > | > | > + int new; > | > | > + > | > | > + new = is_new_instance_mount(data); > | > | > + if (new < 0) > | > | > + return new; > | > | > + > | > | > + if (new) > | > | > + return new_pts_mount(fs_type, flags, data, mnt); > | > | > + > | > | > + return init_pts_mount(fs_type, flags, data, mnt); > | > | > | > | Wait a sec - so if a container does > | > | > | > | mount -t devpts -o newinstance none /dev/pts > | > | and then later on just does > | > | mount -t devpts none /dev/pts > | > | > | > | it'll get the init_pts_ns, not the one it had created? > | > > | > Yes. Should we treat the latter as remount of the private instance ? > | > If so, user could add '-oremount' ? > | > > | > The logic seems simple: With newinstance create a private namespace. > | > Without newinstance, bind to initial ns. > | > | But if I'm in a container in a new mounts ns and somehow managed > | to umount -l /dev/pts, shouldn't i be able to remount my container's > | devpts by just doing 'mount -t devpts devpts /dev/pts'? > > Now wouldn't that require us to associate the devpts mount with some > notion of a container ? (a namespace object in nsproxy of container-init > like we do with /proc). Yes. > Yes, after 'umount -l' we have lost _that_ devpts ns and we may have to > 'redo' the relevant container-init parts It all just feels fragile. I realize this is an attempt to do the 'pure fs approach' you were asked to do. I just don't like the resulting semantics. -serge