RFC [PATCH 0/6] Client support for crossing NFS server mountpoints

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC [PATCH 0/6] Client support for crossing NFS server mountpoints
@ 2006-04-11 17:45 Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount() Trond Myklebust
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 17:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfsv4, nfs

The following series of patches implement NFS client support for crossing
server submounts (assuming that the server is exporting them using the
'nohide' option).  We wish to ensure that inode numbers remain unique
on either side of the mountpoint, so that programs like 'tar' and
'rsync' do not get confused when confronted with files that have the same
inode number, but are actually on different filesystems on the server.

This is achieved by having the client automatically create a submount
that mirrors the one on the server.

In order to avoid confusing users, we would like for this mountpoint to be
transparent to 'umount': IOW: when the user mounts the filesystem '/foo',
then an automatic submount by the NFS client for /foo/bar should not cause
'umount /foo' (particularly since the kernel cannot create entries for
/foo/bar in /etc/mtab). To get around this we mark automatically
created submounts using the new flag MNT_SHRINKABLE, and then allow
the NFS client to attempt to unmount them whenever the user calls umount on
the parent.

Note: This code also serves as the base for NFSv4 'referral' support, in
which one server may direct the client to a different server as it crosses
into a filesystem that has been migrated.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount()
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
@ 2006-04-11 18:05 ` Trond Myklebust
  2006-04-17 18:52   ` Christoph Hellwig
  2006-04-11 18:05 ` RFC [PATCH 2/6] VFS: Add shrink_submounts() Trond Myklebust
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 18:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfsv4, nfs

From: Trond Myklebust <Trond.Myklebust@netapp.com>

do_kern_mount() does not allow the kernel to use private mount interfaces
without exposing the same interfaces to userland. The problem is that the
filesystem is referenced by name, thus meaning that it and its mount
interface must be registered in the global filesystem list.

vfs_kern_mount() passes the struct file_system_type as an explicit
parameter in order to overcome this limitation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/super.c            |   22 +++++++++++++++-------
 include/linux/mount.h |    5 +++++
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index a66f66b..848be4f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -800,16 +800,12 @@ struct super_block *get_sb_single(struct
 EXPORT_SYMBOL(get_sb_single);
 
 struct vfsmount *
-do_kern_mount(const char *fstype, int flags, const char *name, void *data)
+vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data)
 {
-	struct file_system_type *type = get_fs_type(fstype);
 	struct super_block *sb = ERR_PTR(-ENOMEM);
 	struct vfsmount *mnt;
 	int error;
 	char *secdata = NULL;
-
-	if (!type)
-		return ERR_PTR(-ENODEV);
 
 	mnt = alloc_vfsmnt(name);
 	if (!mnt)
@@ -841,7 +837,6 @@ do_kern_mount(const char *fstype, int fl
 	mnt->mnt_parent = mnt;
 	up_write(&sb->s_umount);
 	free_secdata(secdata);
-	put_filesystem(type);
 	return mnt;
 out_sb:
 	up_write(&sb->s_umount);
@@ -852,8 +847,21 @@ out_free_secdata:
 out_mnt:
 	free_vfsmnt(mnt);
 out:
-	put_filesystem(type);
 	return (struct vfsmount *)sb;
+}
+
+EXPORT_SYMBOL_GPL(vfs_kern_mount);
+
+struct vfsmount *
+do_kern_mount(const char *fstype, int flags, const char *name, void *data)
+{
+	struct file_system_type *type = get_fs_type(fstype);
+	struct vfsmount *mnt;
+	if (!type)
+		return ERR_PTR(-ENODEV);
+	mnt = vfs_kern_mount(type, flags, name, data);
+	put_filesystem(type);
+	return mnt;
 }
 
 EXPORT_SYMBOL_GPL(do_kern_mount);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index b7472ae..aff68c3 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -73,6 +73,11 @@ extern struct vfsmount *alloc_vfsmnt(con
 extern struct vfsmount *do_kern_mount(const char *fstype, int flags,
 				      const char *name, void *data);
 
+struct file_system_type;
+extern struct vfsmount *vfs_kern_mount(struct file_system_type *type,
+				      int flags, const char *name,
+				      void *data);
+
 struct nameidata;
 
 extern int do_add_mount(struct vfsmount *newmnt, struct nameidata *nd,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount()
  2006-04-11 18:05 ` RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount() Trond Myklebust
@ 2006-04-17 18:52   ` Christoph Hellwig
  2006-04-17 19:35     ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2006-04-17 18:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-fsdevel, nfs, nfsv4

On Tue, Apr 11, 2006 at 02:05:30PM -0400, Trond Myklebust wrote:
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> do_kern_mount() does not allow the kernel to use private mount interfaces
> without exposing the same interfaces to userland. The problem is that the
> filesystem is referenced by name, thus meaning that it and its mount
> interface must be registered in the global filesystem list.
> 
> vfs_kern_mount() passes the struct file_system_type as an explicit
> parameter in order to overcome this limitation.

Looks good.  In addition please switch kern_mount to use it instead
of converting from struct file_system_type to name and back.  Also
all other callers of do_kern_mount except for do_new_mount should
probably use it directly instead of doing the name lookup.  Except
for simple_pin_fs() which will need a paramter change all those
would be trivial aswell.  So instead of adding another entry point care
to switch the existing one to saner prototype and the sane name?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount()
  2006-04-17 18:52   ` Christoph Hellwig
@ 2006-04-17 19:35     ` Trond Myklebust
  2006-04-17 19:39       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2006-04-17 19:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, nfsv4, nfs

On Mon, 2006-04-17 at 19:52 +0100, Christoph Hellwig wrote:
> On Tue, Apr 11, 2006 at 02:05:30PM -0400, Trond Myklebust wrote:
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > do_kern_mount() does not allow the kernel to use private mount interfaces
> > without exposing the same interfaces to userland. The problem is that the
> > filesystem is referenced by name, thus meaning that it and its mount
> > interface must be registered in the global filesystem list.
> > 
> > vfs_kern_mount() passes the struct file_system_type as an explicit
> > parameter in order to overcome this limitation.
> 
> Looks good.  In addition please switch kern_mount to use it instead
> of converting from struct file_system_type to name and back.  Also
> all other callers of do_kern_mount except for do_new_mount should
> probably use it directly instead of doing the name lookup.  Except
> for simple_pin_fs() which will need a paramter change all those
> would be trivial aswell.  So instead of adding another entry point care
> to switch the existing one to saner prototype and the sane name?

That sounds reasonable. By 'switch to the sane name' you do mean convert
all uses of 'do_kern_mount' to 'vfs_kern_mount'?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount()
  2006-04-17 19:35     ` Trond Myklebust
@ 2006-04-17 19:39       ` Christoph Hellwig
  2006-04-17 20:44         ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2006-04-17 19:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-fsdevel, nfsv4, nfs

On Mon, Apr 17, 2006 at 03:35:43PM -0400, Trond Myklebust wrote:
> > all other callers of do_kern_mount except for do_new_mount should
> > probably use it directly instead of doing the name lookup.  Except
> > for simple_pin_fs() which will need a paramter change all those
> > would be trivial aswell.  So instead of adding another entry point care
> > to switch the existing one to saner prototype and the sane name?
> 
> That sounds reasonable. By 'switch to the sane name' you do mean convert
> all uses of 'do_kern_mount' to 'vfs_kern_mount'?

Yes, sorry for the odd wording.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount()
  2006-04-17 19:39       ` Christoph Hellwig
@ 2006-04-17 20:44         ` Trond Myklebust
  2006-04-17 23:39           ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2006-04-17 20:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, nfsv4, nfs

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Mon, 2006-04-17 at 20:39 +0100, Christoph Hellwig wrote:
> On Mon, Apr 17, 2006 at 03:35:43PM -0400, Trond Myklebust wrote:
> > > all other callers of do_kern_mount except for do_new_mount should
> > > probably use it directly instead of doing the name lookup.  Except
> > > for simple_pin_fs() which will need a paramter change all those
> > > would be trivial aswell.  So instead of adding another entry point care
> > > to switch the existing one to saner prototype and the sane name?
> > 
> > That sounds reasonable. By 'switch to the sane name' you do mean convert
> > all uses of 'do_kern_mount' to 'vfs_kern_mount'?
> 
> Yes, sorry for the odd wording.

Hmm... Unfortunately, there appears to be a couple of cases in the VFS
where we actually prefer to use do_kern_mount. I'm thinking in
particular of the cases of fs/nfsctl.c (where we don't want to introduce
a dependency of the VFS on the nfsd module), and of the case of "rootfs"
mounting (where a couple of the arm architectures appear to have quirky
private structures).

We can eliminate all but 3 callers, though through something like the
attached (untested!) patch.

Cheers,
  Trond


[-- Attachment #2: linux-2.6.17-019-unexport_do_kern_mount.dif --]
[-- Type: text/plain, Size: 8118 bytes --]

Author: Trond Myklebust <Trond.Myklebust@netapp.com>
VFS: Unexport do_kern_mount() and clean up simple_pin_fs()

Replace all module uses with the new vfs_kern_mount() interface, and fix up
simple_pin_fs().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 Documentation/filesystems/automount-support.txt |    2 +-
 drivers/usb/core/inode.c                        |    2 +-
 fs/afs/mntpt.c                                  |    2 +-
 fs/afs/super.c                                  |    2 +-
 fs/afs/super.h                                  |    2 ++
 fs/binfmt_misc.c                                |    3 ++-
 fs/configfs/mount.c                             |    2 +-
 fs/debugfs/inode.c                              |    2 +-
 fs/libfs.c                                      |    4 ++--
 fs/super.c                                      |    4 +---
 include/linux/fs.h                              |    2 +-
 mm/shmem.c                                      |    2 +-
 net/sunrpc/rpc_pipe.c                           |    2 +-
 security/inode.c                                |    2 +-
 14 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 58c65a1..7cac200 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -19,7 +19,7 @@ following procedure:
 
  (2) Have the follow_link() op do the following steps:
 
-     (a) Call do_kern_mount() to call the appropriate filesystem to set up a
+     (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
          superblock and gain a vfsmount structure representing it.
 
      (b) Copy the nameidata provided as an argument and substitute the dentry
diff --git a/drivers/usb/core/inode.c b/drivers/usb/core/inode.c
index 3cf945c..695b90a 100644
--- a/drivers/usb/core/inode.c
+++ b/drivers/usb/core/inode.c
@@ -569,7 +569,7 @@ static int create_special_files (void)
 	ignore_mount = 1;
 
 	/* create the devices special file */
-	retval = simple_pin_fs("usbfs", &usbfs_mount, &usbfs_mount_count);
+	retval = simple_pin_fs(&usb_fs_type, &usbfs_mount, &usbfs_mount_count);
 	if (retval) {
 		err ("Unable to get usbfs mount");
 		goto exit;
diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index 4e6eeb5..7b6dc03 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -210,7 +210,7 @@ static struct vfsmount *afs_mntpt_do_aut
 
 	/* try and do the mount */
 	kdebug("--- attempting mount %s -o %s ---", devname, options);
-	mnt = do_kern_mount("afs", 0, devname, options);
+	mnt = vfs_kern_mount(&afs_fs_type, 0, devname, options);
 	kdebug("--- mount result %p ---", mnt);
 
 	free_page((unsigned long) devname);
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 53c56e7..93a7821 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -48,7 +48,7 @@ static void afs_put_super(struct super_b
 
 static void afs_destroy_inode(struct inode *inode);
 
-static struct file_system_type afs_fs_type = {
+struct file_system_type afs_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "afs",
 	.get_sb		= afs_get_sb,
diff --git a/fs/afs/super.h b/fs/afs/super.h
index ac11362..32de8cc 100644
--- a/fs/afs/super.h
+++ b/fs/afs/super.h
@@ -38,6 +38,8 @@ static inline struct afs_super_info *AFS
 	return sb->s_fs_info;
 }
 
+extern struct file_system_type afs_fs_type;
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_AFS_SUPER_H */
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index d73d755..c0a909e 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -55,6 +55,7 @@ typedef struct {
 } Node;
 
 static DEFINE_RWLOCK(entries_lock);
+static struct file_system_type bm_fs_type;
 static struct vfsmount *bm_mnt;
 static int entry_count;
 
@@ -638,7 +639,7 @@ static ssize_t bm_register_write(struct 
 	if (!inode)
 		goto out2;
 
-	err = simple_pin_fs("binfmt_misc", &bm_mnt, &entry_count);
+	err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count);
 	if (err) {
 		iput(inode);
 		inode = NULL;
diff --git a/fs/configfs/mount.c b/fs/configfs/mount.c
index f920d30..be5d86a 100644
--- a/fs/configfs/mount.c
+++ b/fs/configfs/mount.c
@@ -118,7 +118,7 @@ static struct file_system_type configfs_
 
 int configfs_pin_fs(void)
 {
-	return simple_pin_fs("configfs", &configfs_mount,
+	return simple_pin_fs(&configfs_fs_type, &configfs_mount,
 			     &configfs_mnt_count);
 }
 
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 85d166c..579e1b6 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -198,7 +198,7 @@ struct dentry *debugfs_create_file(const
 
 	pr_debug("debugfs: creating file '%s'\n",name);
 
-	error = simple_pin_fs("debugfs", &debugfs_mount, &debugfs_mount_count);
+	error = simple_pin_fs(&debug_fs_type, &debugfs_mount, &debugfs_mount_count);
 	if (error)
 		goto exit;
 
diff --git a/fs/libfs.c b/fs/libfs.c
index 7145ba7..75bb681 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -424,13 +424,13 @@ out:
 
 static DEFINE_SPINLOCK(pin_fs_lock);
 
-int simple_pin_fs(char *name, struct vfsmount **mount, int *count)
+int simple_pin_fs(struct file_system_type *type, char *name, struct vfsmount **mount, int *count)
 {
 	struct vfsmount *mnt = NULL;
 	spin_lock(&pin_fs_lock);
 	if (unlikely(!*mount)) {
 		spin_unlock(&pin_fs_lock);
-		mnt = do_kern_mount(name, 0, name, NULL);
+		mnt = vfs_kern_mount(type, 0, name, NULL);
 		if (IS_ERR(mnt))
 			return PTR_ERR(mnt);
 		spin_lock(&pin_fs_lock);
diff --git a/fs/super.c b/fs/super.c
index 848be4f..15f2afd 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -864,11 +864,9 @@ do_kern_mount(const char *fstype, int fl
 	return mnt;
 }
 
-EXPORT_SYMBOL_GPL(do_kern_mount);
-
 struct vfsmount *kern_mount(struct file_system_type *type)
 {
-	return do_kern_mount(type->name, 0, type->name, NULL);
+	return vfs_kern_mount(type, 0, type->name, NULL);
 }
 
 EXPORT_SYMBOL(kern_mount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3de2bfb..6151043 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1762,7 +1762,7 @@ extern struct inode_operations simple_di
 struct tree_descr { char *name; const struct file_operations *ops; int mode; };
 struct dentry *d_alloc_name(struct dentry *, const char *);
 extern int simple_fill_super(struct super_block *, int, struct tree_descr *);
-extern int simple_pin_fs(char *name, struct vfsmount **mount, int *count);
+extern int simple_pin_fs(struct file_system_type *, char *name, struct vfsmount **mount, int *count);
 extern void simple_release_fs(struct vfsmount **mount, int *count);
 
 extern ssize_t simple_read_from_buffer(void __user *, size_t, loff_t *, const void *, size_t);
diff --git a/mm/shmem.c b/mm/shmem.c
index 37eaf42..180deb4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2258,7 +2258,7 @@ static int __init init_tmpfs(void)
 #ifdef CONFIG_TMPFS
 	devfs_mk_dir("shm");
 #endif
-	shm_mnt = do_kern_mount(tmpfs_fs_type.name, MS_NOUSER,
+	shm_mnt = vfs_kern_mount(&tmpfs_fs_type, MS_NOUSER,
 				tmpfs_fs_type.name, NULL);
 	if (IS_ERR(shm_mnt)) {
 		error = PTR_ERR(shm_mnt);
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index cc673dd..a5226df 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -439,7 +439,7 @@ struct vfsmount *rpc_get_mount(void)
 {
 	int err;
 
-	err = simple_pin_fs("rpc_pipefs", &rpc_mount, &rpc_mount_count);
+	err = simple_pin_fs(&rpc_pipe_fs_type, &rpc_mount, &rpc_mount_count);
 	if (err != 0)
 		return ERR_PTR(err);
 	return rpc_mount;
diff --git a/security/inode.c b/security/inode.c
index 0f77b02..8bf4062 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -224,7 +224,7 @@ struct dentry *securityfs_create_file(co
 
 	pr_debug("securityfs: creating file '%s'\n",name);
 
-	error = simple_pin_fs("securityfs", &mount, &mount_count);
+	error = simple_pin_fs(&fs_type, &mount, &mount_count);
 	if (error) {
 		dentry = ERR_PTR(error);
 		goto exit;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount()
  2006-04-17 20:44         ` Trond Myklebust
@ 2006-04-17 23:39           ` Trond Myklebust
  0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-17 23:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, nfsv4, nfs

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

On Mon, 2006-04-17 at 16:44 -0400, Trond Myklebust wrote:

> Hmm... Unfortunately, there appears to be a couple of cases in the VFS
> where we actually prefer to use do_kern_mount. I'm thinking in
> particular of the cases of fs/nfsctl.c (where we don't want to introduce
> a dependency of the VFS on the nfsd module), and of the case of "rootfs"
> mounting (where a couple of the arm architectures appear to have quirky
> private structures).
> 
> We can eliminate all but 3 callers, though through something like the
> attached (untested!) patch.

...and here is the version that actually compiles and runs.

Cheers,
  Trond

[-- Attachment #2: linux-2.6.17-019-unexport_do_kern_mount.dif --]
[-- Type: text/plain, Size: 8100 bytes --]

Author: Trond Myklebust <Trond.Myklebust@netapp.com>
VFS: Unexport do_kern_mount() and clean up simple_pin_fs()

Replace all module uses with the new vfs_kern_mount() interface, and fix up
simple_pin_fs().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 Documentation/filesystems/automount-support.txt |    2 +-
 drivers/usb/core/inode.c                        |    2 +-
 fs/afs/mntpt.c                                  |    2 +-
 fs/afs/super.c                                  |    2 +-
 fs/afs/super.h                                  |    2 ++
 fs/binfmt_misc.c                                |    3 ++-
 fs/configfs/mount.c                             |    2 +-
 fs/debugfs/inode.c                              |    2 +-
 fs/libfs.c                                      |    4 ++--
 fs/super.c                                      |    4 +---
 include/linux/fs.h                              |    2 +-
 mm/shmem.c                                      |    2 +-
 net/sunrpc/rpc_pipe.c                           |    2 +-
 security/inode.c                                |    2 +-
 14 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 58c65a1..7cac200 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -19,7 +19,7 @@ following procedure:
 
  (2) Have the follow_link() op do the following steps:
 
-     (a) Call do_kern_mount() to call the appropriate filesystem to set up a
+     (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
          superblock and gain a vfsmount structure representing it.
 
      (b) Copy the nameidata provided as an argument and substitute the dentry
diff --git a/drivers/usb/core/inode.c b/drivers/usb/core/inode.c
index 3cf945c..695b90a 100644
--- a/drivers/usb/core/inode.c
+++ b/drivers/usb/core/inode.c
@@ -569,7 +569,7 @@ static int create_special_files (void)
 	ignore_mount = 1;
 
 	/* create the devices special file */
-	retval = simple_pin_fs("usbfs", &usbfs_mount, &usbfs_mount_count);
+	retval = simple_pin_fs(&usb_fs_type, &usbfs_mount, &usbfs_mount_count);
 	if (retval) {
 		err ("Unable to get usbfs mount");
 		goto exit;
diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index 4e6eeb5..7b6dc03 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -210,7 +210,7 @@ static struct vfsmount *afs_mntpt_do_aut
 
 	/* try and do the mount */
 	kdebug("--- attempting mount %s -o %s ---", devname, options);
-	mnt = do_kern_mount("afs", 0, devname, options);
+	mnt = vfs_kern_mount(&afs_fs_type, 0, devname, options);
 	kdebug("--- mount result %p ---", mnt);
 
 	free_page((unsigned long) devname);
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 53c56e7..93a7821 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -48,7 +48,7 @@ static void afs_put_super(struct super_b
 
 static void afs_destroy_inode(struct inode *inode);
 
-static struct file_system_type afs_fs_type = {
+struct file_system_type afs_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "afs",
 	.get_sb		= afs_get_sb,
diff --git a/fs/afs/super.h b/fs/afs/super.h
index ac11362..32de8cc 100644
--- a/fs/afs/super.h
+++ b/fs/afs/super.h
@@ -38,6 +38,8 @@ static inline struct afs_super_info *AFS
 	return sb->s_fs_info;
 }
 
+extern struct file_system_type afs_fs_type;
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_AFS_SUPER_H */
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index d73d755..c0a909e 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -55,6 +55,7 @@ typedef struct {
 } Node;
 
 static DEFINE_RWLOCK(entries_lock);
+static struct file_system_type bm_fs_type;
 static struct vfsmount *bm_mnt;
 static int entry_count;
 
@@ -638,7 +639,7 @@ static ssize_t bm_register_write(struct 
 	if (!inode)
 		goto out2;
 
-	err = simple_pin_fs("binfmt_misc", &bm_mnt, &entry_count);
+	err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count);
 	if (err) {
 		iput(inode);
 		inode = NULL;
diff --git a/fs/configfs/mount.c b/fs/configfs/mount.c
index f920d30..be5d86a 100644
--- a/fs/configfs/mount.c
+++ b/fs/configfs/mount.c
@@ -118,7 +118,7 @@ static struct file_system_type configfs_
 
 int configfs_pin_fs(void)
 {
-	return simple_pin_fs("configfs", &configfs_mount,
+	return simple_pin_fs(&configfs_fs_type, &configfs_mount,
 			     &configfs_mnt_count);
 }
 
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 85d166c..579e1b6 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -198,7 +198,7 @@ struct dentry *debugfs_create_file(const
 
 	pr_debug("debugfs: creating file '%s'\n",name);
 
-	error = simple_pin_fs("debugfs", &debugfs_mount, &debugfs_mount_count);
+	error = simple_pin_fs(&debug_fs_type, &debugfs_mount, &debugfs_mount_count);
 	if (error)
 		goto exit;
 
diff --git a/fs/libfs.c b/fs/libfs.c
index 7145ba7..4a3ec9a 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -424,13 +424,13 @@ out:
 
 static DEFINE_SPINLOCK(pin_fs_lock);
 
-int simple_pin_fs(char *name, struct vfsmount **mount, int *count)
+int simple_pin_fs(struct file_system_type *type, struct vfsmount **mount, int *count)
 {
 	struct vfsmount *mnt = NULL;
 	spin_lock(&pin_fs_lock);
 	if (unlikely(!*mount)) {
 		spin_unlock(&pin_fs_lock);
-		mnt = do_kern_mount(name, 0, name, NULL);
+		mnt = vfs_kern_mount(type, 0, type->name, NULL);
 		if (IS_ERR(mnt))
 			return PTR_ERR(mnt);
 		spin_lock(&pin_fs_lock);
diff --git a/fs/super.c b/fs/super.c
index 848be4f..15f2afd 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -864,11 +864,9 @@ do_kern_mount(const char *fstype, int fl
 	return mnt;
 }
 
-EXPORT_SYMBOL_GPL(do_kern_mount);
-
 struct vfsmount *kern_mount(struct file_system_type *type)
 {
-	return do_kern_mount(type->name, 0, type->name, NULL);
+	return vfs_kern_mount(type, 0, type->name, NULL);
 }
 
 EXPORT_SYMBOL(kern_mount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3de2bfb..0d1d2b4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1762,7 +1762,7 @@ extern struct inode_operations simple_di
 struct tree_descr { char *name; const struct file_operations *ops; int mode; };
 struct dentry *d_alloc_name(struct dentry *, const char *);
 extern int simple_fill_super(struct super_block *, int, struct tree_descr *);
-extern int simple_pin_fs(char *name, struct vfsmount **mount, int *count);
+extern int simple_pin_fs(struct file_system_type *, struct vfsmount **mount, int *count);
 extern void simple_release_fs(struct vfsmount **mount, int *count);
 
 extern ssize_t simple_read_from_buffer(void __user *, size_t, loff_t *, const void *, size_t);
diff --git a/mm/shmem.c b/mm/shmem.c
index 37eaf42..180deb4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2258,7 +2258,7 @@ static int __init init_tmpfs(void)
 #ifdef CONFIG_TMPFS
 	devfs_mk_dir("shm");
 #endif
-	shm_mnt = do_kern_mount(tmpfs_fs_type.name, MS_NOUSER,
+	shm_mnt = vfs_kern_mount(&tmpfs_fs_type, MS_NOUSER,
 				tmpfs_fs_type.name, NULL);
 	if (IS_ERR(shm_mnt)) {
 		error = PTR_ERR(shm_mnt);
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index cc673dd..a5226df 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -439,7 +439,7 @@ struct vfsmount *rpc_get_mount(void)
 {
 	int err;
 
-	err = simple_pin_fs("rpc_pipefs", &rpc_mount, &rpc_mount_count);
+	err = simple_pin_fs(&rpc_pipe_fs_type, &rpc_mount, &rpc_mount_count);
 	if (err != 0)
 		return ERR_PTR(err);
 	return rpc_mount;
diff --git a/security/inode.c b/security/inode.c
index 0f77b02..8bf4062 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -224,7 +224,7 @@ struct dentry *securityfs_create_file(co
 
 	pr_debug("securityfs: creating file '%s'\n",name);
 
-	error = simple_pin_fs("securityfs", &mount, &mount_count);
+	error = simple_pin_fs(&fs_type, &mount, &mount_count);
 	if (error) {
 		dentry = ERR_PTR(error);
 		goto exit;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RFC [PATCH 2/6] VFS: Add shrink_submounts()
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount() Trond Myklebust
@ 2006-04-11 18:05 ` Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 3/6] VFS: Remove dependency of ->umount_begin() call on MNT_FORCE Trond Myklebust
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 18:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfsv4, nfs

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Allow a submount to be marked as being 'shrinkable' by means of the
vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which
attempts to recursively unmount these submounts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/namespace.c        |  124 +++++++++++++++++++++++++++++++++++++++----------
 include/linux/mount.h |    3 +
 2 files changed, 102 insertions(+), 25 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2c5f1f8..7bff436 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1166,13 +1166,46 @@ static void expire_mount(struct vfsmount
 }
 
 /*
+ * go through the vfsmounts we've just consigned to the graveyard to
+ * - check that they're still dead
+ * - delete the vfsmount from the appropriate namespace under lock
+ * - dispose of the corpse
+ */
+static void expire_mount_list(struct list_head *graveyard, struct list_head *mounts)
+{
+	struct namespace *namespace;
+	struct vfsmount *mnt;
+
+	while (!list_empty(graveyard)) {
+		LIST_HEAD(umounts);
+		mnt = list_entry(graveyard->next, struct vfsmount, mnt_expire);
+		list_del_init(&mnt->mnt_expire);
+
+		/* don't do anything if the namespace is dead - all the
+		 * vfsmounts from it are going away anyway */
+		namespace = mnt->mnt_namespace;
+		if (!namespace || !namespace->root)
+			continue;
+		get_namespace(namespace);
+
+		spin_unlock(&vfsmount_lock);
+		down_write(&namespace_sem);
+		expire_mount(mnt, mounts, &umounts);
+		up_write(&namespace_sem);
+		release_mounts(&umounts);
+		mntput(mnt);
+		put_namespace(namespace);
+		spin_lock(&vfsmount_lock);
+	}
+}
+
+/*
  * process a list of expirable mountpoints with the intent of discarding any
  * mountpoints that aren't in use and haven't been touched since last we came
  * here
  */
 void mark_mounts_for_expiry(struct list_head *mounts)
 {
-	struct namespace *namespace;
 	struct vfsmount *mnt, *next;
 	LIST_HEAD(graveyard);
 
@@ -1196,38 +1229,79 @@ void mark_mounts_for_expiry(struct list_
 		list_move(&mnt->mnt_expire, &graveyard);
 	}
 
-	/*
-	 * go through the vfsmounts we've just consigned to the graveyard to
-	 * - check that they're still dead
-	 * - delete the vfsmount from the appropriate namespace under lock
-	 * - dispose of the corpse
-	 */
-	while (!list_empty(&graveyard)) {
-		LIST_HEAD(umounts);
-		mnt = list_entry(graveyard.next, struct vfsmount, mnt_expire);
-		list_del_init(&mnt->mnt_expire);
+	expire_mount_list(&graveyard, mounts);
 
-		/* don't do anything if the namespace is dead - all the
-		 * vfsmounts from it are going away anyway */
-		namespace = mnt->mnt_namespace;
-		if (!namespace || !namespace->root)
+	spin_unlock(&vfsmount_lock);
+}
+
+EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
+
+/*
+ * Ripoff of 'select_parent()'
+ *
+ * search the list of submounts for a given mountpoint, and move any
+ * shrinkable submounts to the 'graveyard' list.
+ */
+static int select_submounts(struct vfsmount *parent, struct list_head *graveyard)
+{
+	struct vfsmount *this_parent = parent;
+	struct list_head *next;
+	int found = 0;
+
+repeat:
+	next = this_parent->mnt_mounts.next;
+resume:
+	while (next != &this_parent->mnt_mounts) {
+		struct list_head *tmp = next;
+		struct vfsmount *mnt = list_entry(tmp, struct vfsmount, mnt_child);
+
+		next = tmp->next;
+		if (!(mnt->mnt_flags & MNT_SHRINKABLE))
 			continue;
-		get_namespace(namespace);
+		/*
+		 * Descend a level if the d_mounts list is non-empty.
+		 */
+		if (!list_empty(&mnt->mnt_mounts)) {
+			this_parent = mnt;
+			goto repeat;
+		}
 
-		spin_unlock(&vfsmount_lock);
-		down_write(&namespace_sem);
-		expire_mount(mnt, mounts, &umounts);
-		up_write(&namespace_sem);
-		release_mounts(&umounts);
-		mntput(mnt);
-		put_namespace(namespace);
-		spin_lock(&vfsmount_lock);
+		if (!propagate_mount_busy(mnt, 1)) {
+			mntget(mnt);
+			list_move_tail(&mnt->mnt_expire, graveyard);
+			found++;
+		}
 	}
+	/*
+	 * All done at this level ... ascend and resume the search
+	 */
+	if (this_parent != parent) {
+		next = this_parent->mnt_child.next;
+		this_parent = this_parent->mnt_parent;
+		goto resume;
+	}
+	return found;
+}
+
+/*
+ * process a list of expirable mountpoints with the intent of discarding any
+ * submounts of a specific parent mountpoint
+ */
+void shrink_submounts(struct vfsmount *mountpoint, struct list_head *mounts)
+{
+	LIST_HEAD(graveyard);
+	int found;
 
+	spin_lock(&vfsmount_lock);
+
+	/* extract submounts of 'mountpoint' from the expiration list */
+	while ((found = select_submounts(mountpoint, &graveyard)) != 0)
+		expire_mount_list(&graveyard, mounts);
+
 	spin_unlock(&vfsmount_lock);
 }
 
-EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
+EXPORT_SYMBOL_GPL(shrink_submounts);
 
 /*
  * Some copy_from_user() implementations do not return the exact number of
diff --git a/include/linux/mount.h b/include/linux/mount.h
index aff68c3..9b4e007 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -23,6 +23,8 @@ #define MNT_NOEXEC	0x04
 #define MNT_NOATIME	0x08
 #define MNT_NODIRATIME	0x10
 
+#define MNT_SHRINKABLE	0x100
+
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */
 #define MNT_PNODE_MASK	0x3000	/* propogation flag mask */
@@ -84,6 +86,7 @@ extern int do_add_mount(struct vfsmount 
 			int mnt_flags, struct list_head *fslist);
 
 extern void mark_mounts_for_expiry(struct list_head *mounts);
+extern void shrink_submounts(struct vfsmount *mountpoint, struct list_head *mounts);
 
 extern spinlock_t vfsmount_lock;
 extern dev_t name_to_dev_t(char *name);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RFC [PATCH 3/6] VFS: Remove dependency of ->umount_begin() call on MNT_FORCE
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount() Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 2/6] VFS: Add shrink_submounts() Trond Myklebust
@ 2006-04-11 18:05 ` Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 4/6] NFS: Store the file system "fsid" value in the NFS super block Trond Myklebust
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 18:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs, nfsv4

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Allow filesystems to decide to perform pre-umount processing whether or not
MNT_FORCE is set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/9p/vfs_super.c  |    7 ++++---
 fs/cifs/cifsfs.c   |    6 ++++--
 fs/fuse/inode.c    |    5 +++--
 fs/namespace.c     |    4 ++--
 fs/nfs/inode.c     |   14 +++++++++-----
 include/linux/fs.h |    2 +-
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 61c599b..00c1f6b 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -253,11 +253,12 @@ static int v9fs_show_options(struct seq_
 }
 
 static void
-v9fs_umount_begin(struct super_block *sb)
+v9fs_umount_begin(struct vfsmount *vfsmnt, int flags)
 {
-	struct v9fs_session_info *v9ses = sb->s_fs_info;
+	struct v9fs_session_info *v9ses = vfsmnt->mnt_sb->s_fs_info;
 
-	v9fs_session_cancel(v9ses);
+	if (flags & MNT_FORCE)
+		v9fs_session_cancel(v9ses);
 }
 
 static struct super_operations v9fs_super_ops = {
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index d4b713e..8c60c53 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -404,12 +404,14 @@ static struct quotactl_ops cifs_quotactl
 #endif
 
 #ifdef CONFIG_CIFS_EXPERIMENTAL
-static void cifs_umount_begin(struct super_block * sblock)
+static void cifs_umount_begin(struct vfsmount * vfsmnt, int flags)
 {
 	struct cifs_sb_info *cifs_sb;
 	struct cifsTconInfo * tcon;
 
-	cifs_sb = CIFS_SB(sblock);
+	if (!(flags & MNT_FORCE))
+		return;
+	cifs_sb = CIFS_SB(vfsmnt->mnt_sb);
 	if(cifs_sb == NULL)
 		return;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index fd34037..7b3d4e7 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -195,9 +195,10 @@ struct inode *fuse_iget(struct super_blo
 	return inode;
 }
 
-static void fuse_umount_begin(struct super_block *sb)
+static void fuse_umount_begin(struct vfsmount *vfsmnt, int flags)
 {
-	fuse_abort_conn(get_fuse_conn_super(sb));
+	if (flags & MNT_FORCE)
+		fuse_abort_conn(get_fuse_conn_super(vfsmnt->mnt_sb));
 }
 
 static void fuse_put_super(struct super_block *sb)
diff --git a/fs/namespace.c b/fs/namespace.c
index 7bff436..b21c5c2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -576,8 +576,8 @@ static int do_umount(struct vfsmount *mn
 	 */
 
 	lock_kernel();
-	if ((flags & MNT_FORCE) && sb->s_op->umount_begin)
-		sb->s_op->umount_begin(sb);
+	if (sb->s_op->umount_begin)
+		sb->s_op->umount_begin(mnt, flags);
 	unlock_kernel();
 
 	/*
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 1fd3452..cfcc585 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -64,7 +64,7 @@ static void nfs_destroy_inode(struct ino
 static int nfs_write_inode(struct inode *,int);
 static void nfs_delete_inode(struct inode *);
 static void nfs_clear_inode(struct inode *);
-static void nfs_umount_begin(struct super_block *);
+static void nfs_umount_begin(struct vfsmount *, int);
 static int  nfs_statfs(struct super_block *, struct kstatfs *);
 static int  nfs_show_options(struct seq_file *, struct vfsmount *);
 static int  nfs_show_stats(struct seq_file *, struct vfsmount *);
@@ -179,15 +179,19 @@ nfs_clear_inode(struct inode *inode)
 	BUG_ON(atomic_read(&nfsi->data_updates) != 0);
 }
 
-void
-nfs_umount_begin(struct super_block *sb)
+static void nfs_umount_begin(struct vfsmount *vfsmnt, int flags)
 {
-	struct rpc_clnt	*rpc = NFS_SB(sb)->client;
+	struct nfs_server *server;
+	struct rpc_clnt	*rpc;
 
+	if (!(flags & MNT_FORCE))
+		return;
 	/* -EIO all pending I/O */
+	server = NFS_SB(vfsmnt->mnt_sb);
+	rpc = server->client;
 	if (!IS_ERR(rpc))
 		rpc_killall_tasks(rpc);
-	rpc = NFS_SB(sb)->client_acl;
+	rpc = server->client_acl;
 	if (!IS_ERR(rpc))
 		rpc_killall_tasks(rpc);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 162c6e5..f83400a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1100,7 +1100,7 @@ struct super_operations {
 	int (*statfs) (struct super_block *, struct kstatfs *);
 	int (*remount_fs) (struct super_block *, int *, char *);
 	void (*clear_inode) (struct inode *);
-	void (*umount_begin) (struct super_block *);
+	void (*umount_begin) (struct vfsmount *, int);
 
 	int (*show_options)(struct seq_file *, struct vfsmount *);
 	int (*show_stats)(struct seq_file *, struct vfsmount *);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RFC [PATCH 4/6] NFS: Store the file system "fsid" value in the NFS super block.
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
                   ` (2 preceding siblings ...)
  2006-04-11 18:05 ` RFC [PATCH 3/6] VFS: Remove dependency of ->umount_begin() call on MNT_FORCE Trond Myklebust
@ 2006-04-11 18:05 ` Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 5/6] NFS: Ensure the client submounts, when it crosses a server mountpoint Trond Myklebust
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 18:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfsv4, nfs

From: Trond Myklebust <Trond.Myklebust@netapp.com>

This should enable us to detect if we are crossing a mountpoint in the
case where the server is exporting "nohide" mounts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/idmap.c            |    1 -
 fs/nfs/inode.c            |    8 ++++++++
 fs/nfs/nfs2xdr.c          |    3 ++-
 fs/nfs/nfs3xdr.c          |    3 ++-
 fs/nfs/nfs4xdr.c          |    4 ++--
 include/linux/nfs_fs.h    |    5 +++--
 include/linux/nfs_fs_sb.h |    1 +
 include/linux/nfs_page.h  |    1 -
 include/linux/nfs_xdr.h   |   19 ++++++++++++-------
 9 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/idmap.c b/fs/nfs/idmap.c
index 3fab5b0..b81e7ed 100644
--- a/fs/nfs/idmap.c
+++ b/fs/nfs/idmap.c
@@ -47,7 +47,6 @@ #include <linux/sunrpc/clnt.h>
 #include <linux/workqueue.h>
 #include <linux/sunrpc/rpc_pipe_fs.h>
 
-#include <linux/nfs_fs_sb.h>
 #include <linux/nfs_fs.h>
 
 #include <linux/nfs_idmap.h>
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index cfcc585..bf9d404 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -253,6 +253,7 @@ nfs_get_root(struct super_block *sb, str
 		return ERR_PTR(error);
 	}
 
+	server->fsid = fsinfo->fattr->fsid;
 	return nfs_fhget(sb, rootfh, fsinfo->fattr);
 }
 
@@ -1514,6 +1515,7 @@ out:
  */
 static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 {
+	struct nfs_server *server;
 	struct nfs_inode *nfsi = NFS_I(inode);
 	loff_t cur_isize, new_isize;
 	unsigned int	invalid = 0;
@@ -1531,6 +1533,12 @@ static int nfs_update_inode(struct inode
 	 */
 	if ((inode->i_mode & S_IFMT) != (fattr->mode & S_IFMT))
 		goto out_changed;
+
+	server = NFS_SERVER(inode);
+	/* Update the fsid if and only if this is the root directory */
+	if (inode == inode->i_sb->s_root->d_inode
+			&& !nfs_fsid_equal(&server->fsid, &fattr->fsid))
+		server->fsid = fattr->fsid;
 
 	/*
 	 * Update the read time so we don't revalidate too often.
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index f0015fa..a7ed88f 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -131,7 +131,8 @@ xdr_decode_fattr(u32 *p, struct nfs_fatt
 	fattr->du.nfs2.blocksize = ntohl(*p++);
 	rdev = ntohl(*p++);
 	fattr->du.nfs2.blocks = ntohl(*p++);
-	fattr->fsid_u.nfs3 = ntohl(*p++);
+	fattr->fsid.major = ntohl(*p++);
+	fattr->fsid.minor = 0;
 	fattr->fileid = ntohl(*p++);
 	p = xdr_decode_time(p, &fattr->atime);
 	p = xdr_decode_time(p, &fattr->mtime);
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index ec23361..f70eee2 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -166,7 +166,8 @@ xdr_decode_fattr(u32 *p, struct nfs_fatt
 	if (MAJOR(fattr->rdev) != major || MINOR(fattr->rdev) != minor)
 		fattr->rdev = 0;
 
-	p = xdr_decode_hyper(p, &fattr->fsid_u.nfs3);
+	p = xdr_decode_hyper(p, &fattr->fsid.major);
+	fattr->fsid.minor = 0;
 	p = xdr_decode_hyper(p, &fattr->fileid);
 	p = xdr_decode_time3(p, &fattr->atime);
 	p = xdr_decode_time3(p, &fattr->mtime);
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 7c5d70e..7270d12 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2217,7 +2217,7 @@ static int decode_attr_symlink_support(s
 	return 0;
 }
 
-static int decode_attr_fsid(struct xdr_stream *xdr, uint32_t *bitmap, struct nfs4_fsid *fsid)
+static int decode_attr_fsid(struct xdr_stream *xdr, uint32_t *bitmap, struct nfs_fsid *fsid)
 {
 	uint32_t *p;
 
@@ -2863,7 +2863,7 @@ static int decode_getfattr(struct xdr_st
 		goto xdr_error;
 	if ((status = decode_attr_size(xdr, bitmap, &fattr->size)) != 0)
 		goto xdr_error;
-	if ((status = decode_attr_fsid(xdr, bitmap, &fattr->fsid_u.nfs4)) != 0)
+	if ((status = decode_attr_fsid(xdr, bitmap, &fattr->fsid)) != 0)
 		goto xdr_error;
 	if ((status = decode_attr_fileid(xdr, bitmap, &fattr->fileid)) != 0)
 		goto xdr_error;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index c71227d..83e2b8a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -16,8 +16,6 @@ #include <linux/pagemap.h>
 #include <linux/rwsem.h>
 #include <linux/wait.h>
 
-#include <linux/nfs_fs_sb.h>
-
 #include <linux/sunrpc/debug.h>
 #include <linux/sunrpc/auth.h>
 #include <linux/sunrpc/clnt.h>
@@ -27,6 +25,9 @@ #include <linux/nfs2.h>
 #include <linux/nfs3.h>
 #include <linux/nfs4.h>
 #include <linux/nfs_xdr.h>
+
+#include <linux/nfs_fs_sb.h>
+
 #include <linux/rwsem.h>
 #include <linux/mempool.h>
 
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 65dec21..6b4a13c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -35,6 +35,7 @@ struct nfs_server {
 	char *			hostname;	/* remote hostname */
 	struct nfs_fh		fh;
 	struct sockaddr_in	addr;
+	struct nfs_fsid		fsid;
 	unsigned long		mount_time;	/* when this fs was mounted */
 #ifdef CONFIG_NFS_V4
 	/* Our own IP address, as a null-terminated string.
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 66e2ed6..4cee1f8 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -13,7 +13,6 @@ #define _LINUX_NFS_PAGE_H
 #include <linux/list.h>
 #include <linux/pagemap.h>
 #include <linux/wait.h>
-#include <linux/nfs_fs_sb.h>
 #include <linux/sunrpc/auth.h>
 #include <linux/nfs_xdr.h>
 
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index c483e23..906c462 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -14,11 +14,19 @@ #define NFS_MAX_FILE_IO_SIZE	(1048576U)
 #define NFS_DEF_FILE_IO_SIZE	(4096U)
 #define NFS_MIN_FILE_IO_SIZE	(1024U)
 
-struct nfs4_fsid {
-	__u64 major;
-	__u64 minor;
+struct nfs_fsid {
+	uint64_t		major;
+	uint64_t		minor;
 };
 
+/*
+ * Helper for checking equality between 2 fsids.
+ */
+static inline int nfs_fsid_equal(const struct nfs_fsid *a, const struct nfs_fsid *b)
+{
+	return a->major == b->major && a->minor == b->minor;
+}
+
 struct nfs_fattr {
 	unsigned short		valid;		/* which fields are valid */
 	__u64			pre_size;	/* pre_op_attr.size	  */
@@ -40,10 +48,7 @@ struct nfs_fattr {
 		} nfs3;
 	} du;
 	dev_t			rdev;
-	union {
-		__u64		nfs3;		/* also nfs2 */
-		struct nfs4_fsid nfs4;
-	} fsid_u;
+	struct nfs_fsid		fsid;
 	__u64			fileid;
 	struct timespec		atime;
 	struct timespec		mtime;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RFC [PATCH 5/6] NFS: Ensure the client submounts, when it crosses a server mountpoint.
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
                   ` (3 preceding siblings ...)
  2006-04-11 18:05 ` RFC [PATCH 4/6] NFS: Store the file system "fsid" value in the NFS super block Trond Myklebust
@ 2006-04-11 18:05 ` Trond Myklebust
  2006-04-11 18:05 ` RFC [PATCH 6/6] NFS: Add timeout to submounts Trond Myklebust
  2007-05-24  1:16 ` possible bug/oops in nfs_pageio_add_request (2.6.22-rc2)? Erez Zadok
  6 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 18:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfsv4, nfs

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/Makefile        |    3 
 fs/nfs/dir.c           |   16 +++
 fs/nfs/inode.c         |  303 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/namespace.c     |   89 ++++++++++++++
 fs/nfs/nfs4_fs.h       |    1 
 fs/nfs/nfs4proc.c      |    2 
 include/linux/nfs_fs.h |    9 +
 7 files changed, 418 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index ec61fd5..d9d494c 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -5,7 +5,8 @@ #
 obj-$(CONFIG_NFS_FS) += nfs.o
 
 nfs-y 			:= dir.o file.o inode.o nfs2xdr.o pagelist.o \
-			   proc.o read.o symlink.o unlink.o write.o
+			   proc.o read.o symlink.o unlink.o write.o \
+			   namespace.o
 nfs-$(CONFIG_ROOT_NFS)	+= nfsroot.o mount_clnt.o      
 nfs-$(CONFIG_NFS_V3)	+= nfs3proc.o nfs3xdr.o
 nfs-$(CONFIG_NFS_V3_ACL)	+= nfs3acl.o
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a23f348..866672a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -869,6 +869,17 @@ int nfs_is_exclusive_create(struct inode
 	return (nd->intent.open.flags & O_EXCL) != 0;
 }
 
+static inline int nfs_reval_fsid(struct inode *dir,
+		struct nfs_fh *fh, struct nfs_fattr *fattr)
+{
+	struct nfs_server *server = NFS_SERVER(dir);
+
+	if (!nfs_fsid_equal(&server->fsid, &fattr->fsid))
+		/* Revalidate fsid on root dir */
+		return __nfs_revalidate_inode(server, dir->i_sb->s_root->d_inode);
+	return 0;
+}
+
 static struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, struct nameidata *nd)
 {
 	struct dentry *res;
@@ -897,6 +908,11 @@ static struct dentry *nfs_lookup(struct 
 	error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, &fattr);
 	if (error == -ENOENT)
 		goto no_entry;
+	if (error < 0) {
+		res = ERR_PTR(error);
+		goto out_unlock;
+	}
+	error = nfs_reval_fsid(dir, &fhandle, &fattr);
 	if (error < 0) {
 		res = ERR_PTR(error);
 		goto out_unlock;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bf9d404..f5a133f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -238,6 +238,14 @@ nfs_block_size(unsigned long bsize, unsi
 	return nfs_block_bits(bsize, nrbitsp);
 }
 
+static inline void
+nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
+{
+	sb->s_maxbytes = (loff_t)maxfilesize;
+	if (sb->s_maxbytes > MAX_LFS_FILESIZE || sb->s_maxbytes <= 0)
+		sb->s_maxbytes = MAX_LFS_FILESIZE;
+}
+
 /*
  * Obtain the root inode of the file system.
  */
@@ -348,9 +356,7 @@ nfs_sb_init(struct super_block *sb, rpc_
 	}
 	server->backing_dev_info.ra_pages = server->rpages * NFS_MAX_READAHEAD;
 
-	sb->s_maxbytes = fsinfo.maxfilesize;
-	if (sb->s_maxbytes > MAX_LFS_FILESIZE) 
-		sb->s_maxbytes = MAX_LFS_FILESIZE; 
+	nfs_super_set_maxbytes(sb, fsinfo.maxfilesize);
 
 	server->client->cl_intr = (server->flags & NFS_MOUNT_INTR) ? 1 : 0;
 	server->client->cl_softrtry = (server->flags & NFS_MOUNT_SOFT) ? 1 : 0;
@@ -897,6 +903,11 @@ nfs_fhget(struct super_block *sb, struct
 			if (nfs_server_capable(inode, NFS_CAP_READDIRPLUS)
 			    && fattr->size <= NFS_LIMIT_READDIRPLUS)
 				set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_FLAGS(inode));
+			/* Deal with crossing mountpoints */
+			if (!nfs_fsid_equal(&NFS_SB(sb)->fsid, &fattr->fsid)) {
+				inode->i_op = &nfs_mountpoint_inode_operations;
+				inode->i_fop = NULL;
+			}
 		} else if (S_ISLNK(inode->i_mode))
 			inode->i_op = &nfs_symlink_inode_operations;
 		else
@@ -1670,6 +1681,141 @@ #endif
 /*
  * File system information
  */
+
+/*
+ * nfs_path - reconstruct the path given an arbitrary dentry
+ * @base - arbitrary string to prepend to the path
+ * @dentry - pointer to dentry
+ * @buffer - result buffer
+ * @buflen - length of buffer
+ *
+ * Helper function for constructing the path from the
+ * root dentry to an arbitrary hashed dentry.
+ *
+ * This is mainly for use in figuring out the path on the
+ * server side when automounting on top of an existing partition.
+ */
+static char *nfs_path(const char *base, const struct dentry *dentry,
+		      char *buffer, ssize_t buflen)
+{
+	char *end = buffer+buflen;
+	int namelen;
+
+	*--end = '\0';
+	buflen--;
+	spin_lock(&dcache_lock);
+	while (!IS_ROOT(dentry)) {
+		namelen = dentry->d_name.len;
+		buflen -= namelen + 1;
+		if (buflen < 0)
+			goto Elong;
+		end -= namelen;
+		memcpy(end, dentry->d_name.name, namelen);
+		*--end = '/';
+		dentry = dentry->d_parent;
+	}
+	spin_unlock(&dcache_lock);
+	namelen = strlen(base);
+	/* Strip off excess slashes in base string */
+	while (namelen > 0 && base[namelen - 1] == '/')
+		namelen--;
+	buflen -= namelen;
+	if (buflen < 0)
+		goto Elong;
+	end -= namelen;
+	memcpy(end, base, namelen);
+	return end;
+Elong:
+	return ERR_PTR(-ENAMETOOLONG);
+}
+
+struct nfs_clone_mount {
+	const struct super_block *sb;
+	const struct dentry *dentry;
+	struct nfs_fh *fh;
+	struct nfs_fattr *fattr;
+};
+
+static struct super_block *nfs_clone_generic_sb(struct nfs_clone_mount *data,
+		struct super_block *(*clone_client)(struct nfs_server *, struct nfs_clone_mount *))
+{
+	struct nfs_server *server;
+	struct nfs_server *parent = NFS_SB(data->sb);
+	struct super_block *sb = ERR_PTR(-EINVAL);
+	void *err = ERR_PTR(-ENOMEM);
+	struct inode *root_inode;
+	struct nfs_fsinfo fsinfo;
+	int len;
+
+	server = kmalloc(sizeof(struct nfs_server), GFP_KERNEL);
+	if (server == NULL)
+		goto out_err;
+	memcpy(server, parent, sizeof(*server));
+	len = strlen(parent->hostname) + 1;
+	server->hostname = kmalloc(len, GFP_KERNEL);
+	if (server->hostname == NULL)
+		goto free_server;
+	memcpy(server->hostname, parent->hostname, len);
+	server->fsid = data->fattr->fsid;
+	nfs_copy_fh(&server->fh, data->fh);
+	if (rpciod_up() != 0)
+		goto free_hostname;
+
+	sb = clone_client(server, data);
+	if (IS_ERR((err = sb)) || sb->s_root)
+		goto kill_rpciod;
+
+	sb->s_op = data->sb->s_op;
+	sb->s_blocksize = data->sb->s_blocksize;
+	sb->s_blocksize_bits = data->sb->s_blocksize_bits;
+	sb->s_maxbytes = data->sb->s_maxbytes;
+
+	server->client_sys = server->client_acl = ERR_PTR(-EINVAL);
+	err = ERR_PTR(-ENOMEM);
+	server->io_stats = nfs_alloc_iostats();
+	if (server->io_stats == NULL)
+		goto out_deactivate;
+
+	server->client = rpc_clone_client(parent->client);
+	if (IS_ERR((err = server->client)))
+		goto out_deactivate;
+	if (!IS_ERR(parent->client_sys)) {
+		server->client_sys = rpc_clone_client(parent->client_sys);
+		if (IS_ERR((err = server->client_sys)))
+			goto out_deactivate;
+	}
+	if (!IS_ERR(parent->client_acl)) {
+		server->client_acl = rpc_clone_client(parent->client_acl);
+		if (IS_ERR((err = server->client_acl)))
+			goto out_deactivate;
+	}
+	root_inode = nfs_fhget(sb, data->fh, data->fattr);
+	if (!root_inode)
+		goto out_deactivate;
+	sb->s_root = d_alloc_root(root_inode);
+	if (!sb->s_root)
+		goto out_put_root;
+	fsinfo.fattr = data->fattr;
+	if (NFS_PROTO(root_inode)->fsinfo(server, data->fh, &fsinfo) == 0)
+		nfs_super_set_maxbytes(sb, fsinfo.maxfilesize);
+	sb->s_root->d_op = server->rpc_ops->dentry_ops;
+	sb->s_flags |= MS_ACTIVE;
+	return sb;
+out_put_root:
+	iput(root_inode);
+out_deactivate:
+	up_write(&sb->s_umount);
+	deactivate_super(sb);
+	return (struct super_block *)err;
+kill_rpciod:
+	rpciod_down();
+free_hostname:
+	kfree(server->hostname);
+free_server:
+	kfree(server);
+out_err:
+	return (struct super_block *)err;
+}
 
 static int nfs_set_super(struct super_block *s, void *data)
 {
@@ -1827,7 +1973,32 @@ static struct file_system_type nfs_fs_ty
 	.kill_sb	= nfs_kill_super,
 	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
 };
+
+static struct super_block *nfs_clone_client(struct nfs_server *server, struct nfs_clone_mount *data)
+{
+	struct super_block *sb;
 
+	sb = sget(&nfs_fs_type, nfs_compare_super, nfs_set_super, server);
+	if (!IS_ERR(sb) && sb->s_root == NULL && !(server->flags & NFS_MOUNT_NONLM))
+		lockd_up();
+	return sb;
+}
+
+static struct super_block *nfs_clone_nfs_sb(struct file_system_type *fs_type,
+		int flags, const char *dev_name, void *raw_data)
+{
+	struct nfs_clone_mount *data = raw_data;
+	return nfs_clone_generic_sb(data, nfs_clone_client);
+}
+
+static struct file_system_type clone_nfs_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "nfs",
+	.get_sb		= nfs_clone_nfs_sb,
+	.kill_sb	= nfs_kill_super,
+	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+};
+
 #ifdef CONFIG_NFS_V4
 
 static void nfs4_clear_inode(struct inode *);
@@ -2177,7 +2348,76 @@ static int param_set_idmap_timeout(const
 
 module_param_call(idmap_cache_timeout, param_set_idmap_timeout, param_get_int,
 		 &nfs_idmap_cache_timeout, 0644);
+
+/* Constructs the SERVER-side path */
+static inline char *nfs4_path(const struct dentry *dentry, char *buffer, ssize_t buflen)
+{
+	return nfs_path(NFS_SB(dentry->d_sb)->mnt_path, dentry, buffer, buflen);
+}
+
+static inline char *nfs4_dup_path(const struct dentry *dentry)
+{
+	char *page = (char *) __get_free_page(GFP_USER);
+	char *path;
 
+	path = nfs4_path(dentry, page, PAGE_SIZE);
+	if (!IS_ERR(path)) {
+		int len = PAGE_SIZE + page - path;
+		char *tmp = path;
+
+		path = kmalloc(len, GFP_KERNEL);
+		if (path)
+			memcpy(path, tmp, len);
+		else
+			path = ERR_PTR(-ENOMEM);
+	}
+	free_page((unsigned long)page);
+	return path;
+}
+
+static struct super_block *nfs4_clone_client(struct nfs_server *server, struct nfs_clone_mount *data)
+{
+	const struct dentry *dentry = data->dentry;
+	struct nfs4_client *clp = server->nfs4_state;
+	struct super_block *sb;
+
+	server->mnt_path = nfs4_dup_path(dentry);
+	if (IS_ERR(server->mnt_path)) {
+		sb = (struct super_block *)server->mnt_path;
+		goto err;
+	}
+	sb = sget(&nfs4_fs_type, nfs4_compare_super, nfs_set_super, server);
+	if (IS_ERR(sb) || sb->s_root)
+		goto free_path;
+	nfs4_server_capabilities(server, &server->fh);
+
+	down_write(&clp->cl_sem);
+	atomic_inc(&clp->cl_count);
+	list_add_tail(&server->nfs4_siblings, &clp->cl_superblocks);
+	up_write(&clp->cl_sem);
+	return sb;
+free_path:
+	kfree(server->mnt_path);
+err:
+	server->mnt_path = NULL;
+	return sb;
+}
+
+static struct super_block *nfs_clone_nfs4_sb(struct file_system_type *fs_type,
+		int flags, const char *dev_name, void *raw_data)
+{
+	struct nfs_clone_mount *data = raw_data;
+	return nfs_clone_generic_sb(data, nfs4_clone_client);
+}
+
+static struct file_system_type clone_nfs4_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "nfs",
+	.get_sb		= nfs_clone_nfs4_sb,
+	.kill_sb	= nfs4_kill_super,
+	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+};
+
 #define nfs4_init_once(nfsi) \
 	do { \
 		INIT_LIST_HEAD(&(nfsi)->open_states); \
@@ -2205,11 +2445,68 @@ static inline void unregister_nfs4fs(voi
 	nfs_unregister_sysctl();
 }
 #else
+#define nfs4_clone_client(a,b) ERR_PTR(-EINVAL)
 #define nfs4_init_once(nfsi) \
 	do { } while (0)
 #define register_nfs4fs() (0)
 #define unregister_nfs4fs()
 #endif
+
+static inline char *nfs_devname(const struct vfsmount *mnt_parent,
+			 const struct dentry *dentry,
+			 char *buffer, ssize_t buflen)
+{
+	return nfs_path(mnt_parent->mnt_devname, dentry, buffer, buflen);
+}
+
+/**
+ * nfs_do_submount - set up mountpoint when crossing a filesystem boundary
+ * @mnt_parent - mountpoint of parent directory
+ * @dentry - parent directory
+ * @fh - filehandle for new root dentry
+ * @fattr - attributes for new root inode
+ *
+ */
+struct vfsmount *nfs_do_submount(const struct vfsmount *mnt_parent,
+		const struct dentry *dentry, struct nfs_fh *fh,
+		struct nfs_fattr *fattr)
+{
+	struct nfs_clone_mount mountdata = {
+		.sb = mnt_parent->mnt_sb,
+		.dentry = dentry,
+		.fh = fh,
+		.fattr = fattr,
+	};
+	struct vfsmount *mnt = ERR_PTR(-ENOMEM);
+	char *page = (char *) __get_free_page(GFP_USER);
+	char *devname;
+
+	dprintk("%s: submounting on %s/%s\n", __FUNCTION__,
+			dentry->d_parent->d_name.name,
+			dentry->d_name.name);
+	if (page == NULL)
+		goto out;
+	devname = nfs_devname(mnt_parent, dentry, page, PAGE_SIZE);
+	mnt = (struct vfsmount *)devname;
+	if (IS_ERR(devname))
+		goto free_page;
+	switch (NFS_SB(mnt_parent->mnt_sb)->rpc_ops->version) {
+		case 2:
+		case 3:
+			mnt = vfs_kern_mount(&clone_nfs_fs_type, 0, devname, &mountdata);
+			break;
+		case 4:
+			mnt = vfs_kern_mount(&clone_nfs4_fs_type, 0, devname, &mountdata);
+			break;
+		default:
+			BUG();
+	}
+free_page:
+	free_page((unsigned long)page);
+out:
+	dprintk("%s: done\n", __FUNCTION__);
+	return mnt;
+}
 
 extern int nfs_init_nfspagecache(void);
 extern void nfs_destroy_nfspagecache(void);
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
new file mode 100644
index 0000000..a155505
--- /dev/null
+++ b/fs/nfs/namespace.c
@@ -0,0 +1,89 @@
+/*
+ * linux/fs/nfs/namespace.c
+ *
+ * Copyright (C) 2005 Trond Myklebust <Trond.Myklebust@netapp.com>
+ *
+ * NFS namespace
+ */
+
+#include <linux/config.h>
+
+#include <linux/dcache.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/nfs_fs.h>
+#include <linux/string.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/vfs.h>
+
+#define NFSDBG_FACILITY		NFSDBG_VFS
+
+/*
+ * nfs_follow_mountpoint - handle crossing a mountpoint on the server
+ * @dentry - dentry of mountpoint
+ * @nd - nameidata info
+ *
+ * When we encounter a mountpoint on the server, we want to set up
+ * a mountpoint on the client too, to prevent inode numbers from
+ * colliding, and to allow "df" to work properly.
+ * On NFSv4, we also want to allow for the fact that different
+ * filesystems may be migrated to different servers in a failover
+ * situation, and that different filesystems may want to use
+ * different security flavours.
+ */
+static void * nfs_follow_mountpoint(struct dentry *dentry, struct nameidata *nd)
+{
+	struct vfsmount *mnt;
+	struct nfs_server *server = NFS_SERVER(dentry->d_inode);
+	struct dentry *parent;
+	struct nfs_fh fh;
+	struct nfs_fattr fattr;
+	int err;
+
+	BUG_ON(IS_ROOT(dentry));
+	dprintk("%s: enter\n", __FUNCTION__);
+	dput(nd->dentry);
+	nd->dentry = dget(dentry);
+	if (d_mountpoint(nd->dentry))
+		goto out_follow;
+	/* Look it up again */
+	parent = dget_parent(nd->dentry);
+	err = server->rpc_ops->lookup(parent->d_inode, &nd->dentry->d_name, &fh, &fattr);
+	dput(parent);
+	if (err != 0)
+		goto out_err;
+
+	mnt = nfs_do_submount(nd->mnt, nd->dentry, &fh, &fattr);
+	err = PTR_ERR(mnt);
+	if (IS_ERR(mnt))
+		goto out_err;
+
+	mntget(mnt);
+	err = do_add_mount(mnt, nd, nd->mnt->mnt_flags, NULL);
+	if (err < 0) {
+		mntput(mnt);
+		if (err == -EBUSY)
+			goto out_follow;
+		goto out_err;
+	}
+	mntput(nd->mnt);
+	dput(nd->dentry);
+	nd->mnt = mnt;
+	nd->dentry = dget(mnt->mnt_root);
+out:
+	dprintk("%s: done, returned %d\n", __FUNCTION__, err);
+	return ERR_PTR(err);
+out_err:
+	path_release(nd);
+	goto out;
+out_follow:
+	while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
+		;
+	err = 0;
+	goto out;
+}
+
+struct inode_operations nfs_mountpoint_inode_operations = {
+	.follow_link	= nfs_follow_mountpoint,
+	.getattr	= nfs_getattr,
+};
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 0f5e4e7..307832f 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -217,6 +217,7 @@ extern int nfs4_proc_renew(struct nfs4_c
 extern int nfs4_do_close(struct inode *inode, struct nfs4_state *state);
 extern struct dentry *nfs4_atomic_open(struct inode *, struct dentry *, struct nameidata *);
 extern int nfs4_open_revalidate(struct inode *, struct dentry *, int, struct nameidata *);
+extern int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle);
 
 extern struct nfs4_state_recovery_ops nfs4_reboot_recovery_ops;
 extern struct nfs4_state_recovery_ops nfs4_network_partition_recovery_ops;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 86f81a7..e108142 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1331,7 +1331,7 @@ static int _nfs4_server_capabilities(str
 	return status;
 }
 
-static int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle)
+int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle)
 {
 	struct nfs4_exception exception = { };
 	int err;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 83e2b8a..7cd75e0 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -308,6 +308,10 @@ extern void nfs_end_data_update(struct i
 extern struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx);
 extern void put_nfs_open_context(struct nfs_open_context *ctx);
 extern struct nfs_open_context *nfs_find_open_context(struct inode *inode, struct rpc_cred *cred, int mode);
+extern struct vfsmount *nfs_do_submount(const struct vfsmount *mnt_parent,
+					const struct dentry *dentry,
+					struct nfs_fh *fh,
+					struct nfs_fattr *fattr);
 
 /* linux/net/ipv4/ipconfig.c: trims ip addr off front of name, too. */
 extern u32 root_nfs_parse_addr(char *name); /*__init*/
@@ -392,6 +396,11 @@ #else
 #define nfs_register_sysctl() 0
 #define nfs_unregister_sysctl() do { } while(0)
 #endif
+
+/*
+ * linux/fs/nfs/namespace.c
+ */
+extern struct inode_operations nfs_mountpoint_inode_operations;
 
 /*
  * linux/fs/nfs/unlink.c

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RFC [PATCH 6/6] NFS: Add timeout to submounts
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
                   ` (4 preceding siblings ...)
  2006-04-11 18:05 ` RFC [PATCH 5/6] NFS: Ensure the client submounts, when it crosses a server mountpoint Trond Myklebust
@ 2006-04-11 18:05 ` Trond Myklebust
  2007-05-24  1:16 ` possible bug/oops in nfs_pageio_add_request (2.6.22-rc2)? Erez Zadok
  6 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2006-04-11 18:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs, nfsv4

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Make automounted partitions expire using the mark_mounts_for_expiry()
function. The timeout is controlled via a sysctl.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/inode.c         |    3 +++
 fs/nfs/namespace.c     |   25 ++++++++++++++++++++++++-
 fs/nfs/sysctl.c        |   10 ++++++++++
 include/linux/nfs_fs.h |    3 +++
 4 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f5a133f..e051d00 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -184,6 +184,7 @@ static void nfs_umount_begin(struct vfsm
 	struct nfs_server *server;
 	struct rpc_clnt	*rpc;
 
+	shrink_submounts(vfsmnt, &nfs_automount_list);
 	if (!(flags & MNT_FORCE))
 		return;
 	/* -EIO all pending I/O */
@@ -1964,6 +1965,7 @@ static void nfs_kill_super(struct super_
 	nfs_free_iostats(server->io_stats);
 	kfree(server->hostname);
 	kfree(server);
+	nfs_release_automount_timer();
 }
 
 static struct file_system_type nfs_fs_type = {
@@ -2310,6 +2312,7 @@ static void nfs4_kill_super(struct super
 	nfs_free_iostats(server->io_stats);
 	kfree(server->hostname);
 	kfree(server);
+	nfs_release_automount_timer();
 }
 
 static struct file_system_type nfs4_fs_type = {
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index a155505..e426516 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -18,6 +18,11 @@ #include <linux/vfs.h>
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
 
+LIST_HEAD(nfs_automount_list);
+static void nfs_expire_automounts(void *list);
+static DECLARE_WORK(nfs_automount_task, nfs_expire_automounts, &nfs_automount_list);
+int nfs_mountpoint_expiry_timeout = 500 * HZ;
+
 /*
  * nfs_follow_mountpoint - handle crossing a mountpoint on the server
  * @dentry - dentry of mountpoint
@@ -59,7 +64,7 @@ static void * nfs_follow_mountpoint(stru
 		goto out_err;
 
 	mntget(mnt);
-	err = do_add_mount(mnt, nd, nd->mnt->mnt_flags, NULL);
+	err = do_add_mount(mnt, nd, nd->mnt->mnt_flags|MNT_SHRINKABLE, &nfs_automount_list);
 	if (err < 0) {
 		mntput(mnt);
 		if (err == -EBUSY)
@@ -70,6 +75,7 @@ static void * nfs_follow_mountpoint(stru
 	dput(nd->dentry);
 	nd->mnt = mnt;
 	nd->dentry = dget(mnt->mnt_root);
+	schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
 out:
 	dprintk("%s: done, returned %d\n", __FUNCTION__, err);
 	return ERR_PTR(err);
@@ -87,3 +93,20 @@ struct inode_operations nfs_mountpoint_i
 	.follow_link	= nfs_follow_mountpoint,
 	.getattr	= nfs_getattr,
 };
+
+static void nfs_expire_automounts(void *data)
+{
+	struct list_head *list = (struct list_head *)data;
+
+	mark_mounts_for_expiry(list);
+	if (!list_empty(list))
+		schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
+}
+
+void nfs_release_automount_timer(void)
+{
+	if (list_empty(&nfs_automount_list)) {
+		cancel_delayed_work(&nfs_automount_task);
+		flush_scheduled_work();
+	}
+}
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 4c486eb..db61e51 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -12,6 +12,7 @@ #include <linux/sysctl.h>
 #include <linux/module.h>
 #include <linux/nfs4.h>
 #include <linux/nfs_idmap.h>
+#include <linux/nfs_fs.h>
 
 #include "callback.h"
 
@@ -46,6 +47,15 @@ #ifdef CONFIG_NFS_V4
 		.strategy = &sysctl_jiffies,
 	},
 #endif
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "nfs_mountpoint_timeout",
+		.data		= &nfs_mountpoint_expiry_timeout,
+		.maxlen		= sizeof(nfs_mountpoint_expiry_timeout),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_jiffies,
+		.strategy	= &sysctl_jiffies,
+	},
 	{ .ctl_name = 0 }
 };
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7cd75e0..fe1e962 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -400,7 +400,10 @@ #endif
 /*
  * linux/fs/nfs/namespace.c
  */
+extern struct list_head nfs_automount_list;
 extern struct inode_operations nfs_mountpoint_inode_operations;
+extern int nfs_mountpoint_expiry_timeout;
+extern void nfs_release_automount_timer(void);
 
 /*
  * linux/fs/nfs/unlink.c

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* possible bug/oops in nfs_pageio_add_request (2.6.22-rc2)?
  2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
                   ` (5 preceding siblings ...)
  2006-04-11 18:05 ` RFC [PATCH 6/6] NFS: Add timeout to submounts Trond Myklebust
@ 2007-05-24  1:16 ` Erez Zadok
  2007-05-24 12:51   ` Trond Myklebust
  6 siblings, 1 reply; 14+ messages in thread
From: Erez Zadok @ 2007-05-24  1:16 UTC (permalink / raw)
  To: linux-fsdevel, nfs, trond.myklebust; +Cc: mhalcrow

I've hit a NULL ptr deref on desc->pg_error below, triggered when mounting a
stackable file system on top of nfsv3:

// from file: nfs/pagelist.c
int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
			   struct nfs_page *req)
{
	while (!nfs_pageio_do_add_request(desc, req)) {
		nfs_pageio_doio(desc);
		if (desc->pg_error < 0)

Scenario:

2.6.22-rc2 with Unionfs 2.0 (release u2 for 2.6.22-rc2, which includes mmap
support).

I mount unionfs on top of nfs (v3).  I have one file in the nfs branch.  I
run a simple program through the union which mmap's the file, changes the
first byte of the file, calls msync(), and then closes.  This causes
unionfs_writepage to be invoked, which in turn calls the lower file system's
->writepage, here nfs_writepage.

The 'wbc' that's passed to unionfs_writepage from the VFS has this:

    wbc->for_writepages = 1
    wbc->fs_private = NULL

If you follow the logic, then nfs_writepage calls nfs_writepage_locked,
passing the same wbc.  nfs_writepage_locked does this:

	if (wbc->for_writepages)
		pgio = wbc->fs_private;
	else {
		nfs_pageio_init_write(&mypgio, inode, wb_priority(wbc));
		pgio = &mypgio;
	}

which means that pgio is set to NULL from the caller's wbc.  Then
nfs_writepage_locked calls nfs_page_async_flush, passing it this pgio
(NULL).  nfs_page_async_flush invokes nfs_pageio_add_request, passing it
this NULL pgio.  Inside nfs_pageio_add_request the NULL is being
dereferenced as desc->pg_error and we get an oops.

As a workaround, in unionfs_writepage I tried this before calling the lower
file system's ->writepage (which was nfs_writepage):

	struct writeback_control lower_wbc;
	memcpy(&lower_wbc, wbc, sizeof(struct writeback_control));
	if (lower_wbc.for_writepages && !lower_wbc.fs_private) {
		printk("unionfs: setting wbc.for_writepages to 0\n");
		lower_wbc.for_writepages = 0;
	}

Then I passed &lower_wbc to the lower file system's writepage method
(nfs_writepage).  It works; no oops, and the file in question was sync'ed to
the backing f/s too.  But I'm not sure if it's the correct workaround and
whether it'd break things for other non-NFS file systems.

It's possible that I'm doing something wrong in unionfs's mmap code, which
indirectly results in a malformed wbc structure being passed to unionfs (by
malformed I mean that wbc->fs_private is NULL and wbc->for_writepages is set
to 1).  If such a wbc can be created by any other means and passed to NFS,
then nfs probably will continue to oops even w/o unionfs.

FWIW, I tried a similar scenario with eCryptfs (another stackable f/s in
2.6.22-rc2) on top of NFSv3, and got the same oops (sorry, Mike :-)

Any pointers would be appreciated.

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: possible bug/oops in nfs_pageio_add_request (2.6.22-rc2)?
  2007-05-24  1:16 ` possible bug/oops in nfs_pageio_add_request (2.6.22-rc2)? Erez Zadok
@ 2007-05-24 12:51   ` Trond Myklebust
  0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2007-05-24 12:51 UTC (permalink / raw)
  To: Erez Zadok; +Cc: linux-fsdevel, nfs, mhalcrow

On Wed, 2007-05-23 at 21:16 -0400, Erez Zadok wrote:
> I've hit a NULL ptr deref on desc->pg_error below, triggered when mounting a
> stackable file system on top of nfsv3:
> 
> // from file: nfs/pagelist.c
> int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
> 			   struct nfs_page *req)
> {
> 	while (!nfs_pageio_do_add_request(desc, req)) {
> 		nfs_pageio_doio(desc);
> 		if (desc->pg_error < 0)
> 
> Scenario:
> 
> 2.6.22-rc2 with Unionfs 2.0 (release u2 for 2.6.22-rc2, which includes mmap
> support).
> 
> I mount unionfs on top of nfs (v3).  I have one file in the nfs branch.  I
> run a simple program through the union which mmap's the file, changes the
> first byte of the file, calls msync(), and then closes.  This causes
> unionfs_writepage to be invoked, which in turn calls the lower file system's
> ->writepage, here nfs_writepage.
> 
> The 'wbc' that's passed to unionfs_writepage from the VFS has this:
> 
>     wbc->for_writepages = 1
>     wbc->fs_private = NULL
> 
> If you follow the logic, then nfs_writepage calls nfs_writepage_locked,
> passing the same wbc.  nfs_writepage_locked does this:
> 
> 	if (wbc->for_writepages)
> 		pgio = wbc->fs_private;
> 	else {
> 		nfs_pageio_init_write(&mypgio, inode, wb_priority(wbc));
> 		pgio = &mypgio;
> 	}
> 
> which means that pgio is set to NULL from the caller's wbc.  Then
> nfs_writepage_locked calls nfs_page_async_flush, passing it this pgio
> (NULL).  nfs_page_async_flush invokes nfs_pageio_add_request, passing it
> this NULL pgio.  Inside nfs_pageio_add_request the NULL is being
> dereferenced as desc->pg_error and we get an oops.
> 
> As a workaround, in unionfs_writepage I tried this before calling the lower
> file system's ->writepage (which was nfs_writepage):
> 
> 	struct writeback_control lower_wbc;
> 	memcpy(&lower_wbc, wbc, sizeof(struct writeback_control));
> 	if (lower_wbc.for_writepages && !lower_wbc.fs_private) {
> 		printk("unionfs: setting wbc.for_writepages to 0\n");
> 		lower_wbc.for_writepages = 0;
> 	}
> 
> Then I passed &lower_wbc to the lower file system's writepage method
> (nfs_writepage).  It works; no oops, and the file in question was sync'ed to
> the backing f/s too.  But I'm not sure if it's the correct workaround and
> whether it'd break things for other non-NFS file systems.
> 
> It's possible that I'm doing something wrong in unionfs's mmap code, which
> indirectly results in a malformed wbc structure being passed to unionfs (by
> malformed I mean that wbc->fs_private is NULL and wbc->for_writepages is set
> to 1).  If such a wbc can be created by any other means and passed to NFS,
> then nfs probably will continue to oops even w/o unionfs.
> 
> FWIW, I tried a similar scenario with eCryptfs (another stackable f/s in
> 2.6.22-rc2) on top of NFSv3, and got the same oops (sorry, Mike :-)
> 
> Any pointers would be appreciated.

If this is truly a call to ->writepages() by the VFS (as opposed to a
call to ->writepage()) then why is unionfs' writepages() failing to call
the underlying writepages method of the host filesystem: in this case
nfs_writepages()?

Trond


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-05-24 12:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-11 17:45 RFC [PATCH 0/6] Client support for crossing NFS server mountpoints Trond Myklebust
2006-04-11 18:05 ` RFC [PATCH 1/6] VFS: Add GPL_EXPORTED function vfs_kern_mount() Trond Myklebust
2006-04-17 18:52   ` Christoph Hellwig
2006-04-17 19:35     ` Trond Myklebust
2006-04-17 19:39       ` Christoph Hellwig
2006-04-17 20:44         ` Trond Myklebust
2006-04-17 23:39           ` Trond Myklebust
2006-04-11 18:05 ` RFC [PATCH 2/6] VFS: Add shrink_submounts() Trond Myklebust
2006-04-11 18:05 ` RFC [PATCH 3/6] VFS: Remove dependency of ->umount_begin() call on MNT_FORCE Trond Myklebust
2006-04-11 18:05 ` RFC [PATCH 4/6] NFS: Store the file system "fsid" value in the NFS super block Trond Myklebust
2006-04-11 18:05 ` RFC [PATCH 5/6] NFS: Ensure the client submounts, when it crosses a server mountpoint Trond Myklebust
2006-04-11 18:05 ` RFC [PATCH 6/6] NFS: Add timeout to submounts Trond Myklebust
2007-05-24  1:16 ` possible bug/oops in nfs_pageio_add_request (2.6.22-rc2)? Erez Zadok
2007-05-24 12:51   ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).