public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ftp.linux.org.uk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	akpm@linux-foundation.org, torvalds@linux-foundation.org
Subject: Re: [RFC PATCH] file as directory
Date: Wed, 23 May 2007 10:51:27 +0100	[thread overview]
Message-ID: <20070523095127.GQ4095@ftp.linux.org.uk> (raw)
In-Reply-To: <E1HqZPd-0008Dh-00@dorka.pomaz.szeredi.hu>

On Tue, May 22, 2007 at 08:48:49PM +0200, Miklos Szeredi wrote:
>   */
> -static int __follow_mount(struct path *path)
> +static int __follow_mount(struct path *path, bool enter)
>  {
>  	int res = 0;
>  	while (d_mountpoint(path->dentry)) {
> -		struct vfsmount *mounted = lookup_mnt(path->mnt, path->dentry);
> +		struct vfsmount *mounted =
> +			lookup_mnt(path->mnt, path->dentry, enter);
> +
>  		if (!mounted)
>  			break;
>  		dput(path->dentry);
> @@ -689,27 +697,37 @@ static int __follow_mount(struct path *p
>  	return res;
>  }
>  
> -static void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
> +/*
> + * Follows mounts on the given nameidata.
> + *
> + * Only follows "directory on file" mounts if LOOKUP_ENTER is set.
> + */
> +void follow_mount(struct nameidata *nd)

BTW, I'd split that (and matching updates in callers) into separate
patch.

>  {
> -	while (d_mountpoint(*dentry)) {
> -		struct vfsmount *mounted = lookup_mnt(*mnt, *dentry);
> +	while (d_mountpoint(nd->dentry)) {
> +		bool enter = nd->flags & LOOKUP_ENTER;

int, surely?

> + * This is called if the object has no ->lookup() defined, yet the
> + * path contains a slash after the object name.
> + *
> + * If the filesystem defines an ->enter() method, this will be called,
> + * and the filesystem shall fill the supplied struct path or return an
> + * error.
> + *
> + * The returned path will be bind mounted on top of the object with
> + * the MNT_DIRONFILE flag, and the nameidata will descend into the
> + * mount.
> + */
> +static int enter_file(struct inode *inode, struct nameidata *nd)
> +{
> +	int err;
> +	struct path newpath;
> +
> +	printk(KERN_DEBUG "%s/%d enter %s/\n", current->comm, current->pid,
> +	       nd->dentry->d_name.name);
> +	if (!inode->i_op->enter)
> +		return -ENOTDIR;
> +
> +	newpath.mnt = NULL;
> +	newpath.dentry = NULL;
> +	err = inode->i_op->enter(nd, &newpath);
> +	if (!err) {
> +		err = mount_dironfile(nd, &newpath);
> +		pathput(&newpath);
> +	}
> +	return err;

Ouch.  What guarantees that two lookups won't race right here?  You are
not holding any locks at that point, AFAICS...

BTW, why newpath?  What's wrong with simply returning a new vfsmount
with right ->mnt_root/->mnt_sb (instead of creating it inside
mount_dironfile())?  ERR_PTR() for error, struct vfsmount * for success...

> @@ -301,8 +310,8 @@ static struct vfsmount *clone_mnt(struct
>  	mnt->mnt_mountpoint = mnt->mnt_root;
>  	mnt->mnt_parent = mnt;
>  
> -	/* don't copy the MNT_USER flag */
> -	mnt->mnt_flags &= ~MNT_USER;
> +	/* don't copy some flags */
> +	mnt->mnt_flags &= ~(MNT_USER | MNT_DIRONFILE);
>  	if (flag & CL_SETUSER)
>  		__set_mnt_user(mnt, owner);

Hmm?  So you do copy them and strip your MNT_DIRONFILE from copies?

> + * This is tricky, because for namespace modification we must take the
> + * namespace semaphore.  But mntput() is called from various places,
> + * sometimes with namespace_sem held.  Fortunately in those places the
> + * mount cannot yet have MNT_DIRONFILE, or at least that's what I
> + * hope...
> + *
> + * The umounting is done in two stages, first the mount is removed
> + * from the hashes.  This is done atomically wrt other mount lookups,
> + * so it's not possible to acquire a new ref to this dead mount that
> + * way.
> + *
> + * Then after having locked namespace_sem and relocked vfsmount_lock,
> + * the mount is properly detached.
> + */
> +static void umount_dironfile(struct vfsmount *mnt)
> +	__releases(vfsmount_lock)
> +{
> +	struct nameidata nd;

You've got to be kidding.  nameidata is *big*.  If anything, we want
to make detach_mnt() take struct path * instead, but even that is
lousy due to recursion.

I really don't like what's going on here.  The thing is, current code
is based on assumption that presence in the mount tree => holding a
reference.  We _might_ deal with that (there was an old plan to change
refcounting logics for vfsmounts), but that sort of games with locks
spells trouble.  What happens, for example, if namespace gets cloned
before you grab namespace_sem?

There's another problem, BTW - a lot of stuff does stat + open + fstat +
compare kind of sequence.  You'll end up mounting/umounting between stat
and open, which opens you to race with somebody else.  Get a different
st_dev, eat a nice unreproducible error from application...

  parent reply	other threads:[~2007-05-23  9:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-22 18:48 [RFC PATCH] file as directory Miklos Szeredi
2007-05-22 22:10 ` Al Viro
2007-05-23  6:36   ` Miklos Szeredi
2007-05-23  7:03     ` Al Viro
2007-05-23  7:19       ` Miklos Szeredi
2007-05-23  7:36         ` Al Viro
2007-05-23  8:05           ` Miklos Szeredi
2007-05-23  8:29             ` Al Viro
2007-05-23  9:03               ` Miklos Szeredi
2007-05-23  9:58                 ` Al Viro
2007-05-23 10:14                   ` Miklos Szeredi
2007-05-23  9:16             ` Jan Blunck
2007-05-23  9:28               ` Miklos Szeredi
2007-05-23 12:34           ` Trond Myklebust
2007-05-23 12:40             ` Al Viro
2007-05-23  9:21     ` Jan Blunck
2007-05-23  9:35       ` Miklos Szeredi
2007-05-24 12:07     ` Pavel Machek
2007-05-28 14:43       ` Miklos Szeredi
2007-05-22 23:26 ` Shaya Potter
2007-05-23  6:39   ` Miklos Szeredi
2007-05-23  9:51 ` Al Viro [this message]
2007-05-23 10:09   ` Miklos Szeredi
2007-05-23 10:24     ` Miklos Szeredi
2007-05-23 10:24     ` Al Viro
2007-05-23 10:40       ` Miklos Szeredi
2007-05-23 11:39         ` Al Viro
2007-05-23 12:16           ` Al Viro
2007-05-23 13:01             ` Miklos Szeredi
2007-05-23 13:51               ` Al Viro
2007-05-23 14:32                 ` Miklos Szeredi
2007-05-23 15:06                   ` Al Viro
2007-05-23 15:25                     ` Miklos Szeredi
2007-05-23 15:37                       ` Al Viro
2007-05-23 15:55                         ` Miklos Szeredi
2007-05-23 13:23           ` Ph. Marek
2007-05-23 13:54             ` Al Viro
2007-05-23 12:01 ` Jan Engelhardt
2007-05-23 13:20 ` Jaroslav Sykora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070523095127.GQ4095@ftp.linux.org.uk \
    --to=viro@ftp.linux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox