[PATCH 0/5] RCU-walk support for autofs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] RCU-walk support for autofs
@ 2014-08-18  6:33 NeilBrown
  2014-08-18  6:33 ` [PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4 NeilBrown
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-18  6:33 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

Hi Ian,
 Have you had a chance to run your tests in these patches yet?
 I've done what testing I can think of and cannot fault them.

 This set is against 3.17-rc1 and make use of the new -EISDIR handling
 for d_manage() and assumes the other patches which already went in
 through Andrew Morton.

 I've added a section to autofs4.txt about mount namespaces, but it is
 otherwise unchanged.

 If I could get an {Acked,Reviewed,Tested}-By in the next few weeks so
 I can send them on to Andrew I would really appreciate it.

Thanks,
NeilBrown

---

NeilBrown (5):
      autofs4: allow RCU-walk to walk through autofs4.
      autofs4: factor should_expire() out of autofs4_expire_indirect.
      autofs4: avoid taking fs_lock during rcu-walk
      autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.
      autofs: the documentation I wanted to read

 Documentation/filesystems/autofs4.txt |  520 +++++++++++++++++++++++++++++++++
 fs/autofs4/autofs_i.h                 |    6 
 fs/autofs4/dev-ioctl.c                |    2 
 fs/autofs4/expire.c                   |  200 ++++++++-----
 fs/autofs4/root.c                     |   62 +++-
 5 files changed, 694 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/filesystems/autofs4.txt

-- 
Signature

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4.
  2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
@ 2014-08-18  6:33 ` NeilBrown
  2014-08-18  6:33 ` [PATCH 5/5] autofs: the documentation I wanted to read NeilBrown
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-18  6:33 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

Any attempt to look up a pathname that passes though an
autofs4 mount is currently forced out of RCU-walk into
REF-walk.

This can significantly hurt performance of many-thread work
loads on many-core systems, especially if the automounted
filesystem supports RCU-walk but doesn't get to benefit from
it.

So if autofs4_d_manage is called with rcu_walk set, only
fail with -ECHILD if it is necessary to wait longer than
a spinlock.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/autofs4/autofs_i.h  |    2 +-
 fs/autofs4/dev-ioctl.c |    2 +-
 fs/autofs4/expire.c    |    4 +++-
 fs/autofs4/root.c      |   44 +++++++++++++++++++++++++++++---------------
 4 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 9e359fb20c0a..2f1032f12d91 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -148,7 +148,7 @@ void autofs4_free_ino(struct autofs_info *);
 
 /* Expiration */
 int is_autofs4_dentry(struct dentry *);
-int autofs4_expire_wait(struct dentry *dentry);
+int autofs4_expire_wait(struct dentry *dentry, int rcu_walk);
 int autofs4_expire_run(struct super_block *, struct vfsmount *,
 			struct autofs_sb_info *,
 			struct autofs_packet_expire __user *);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 5b570b6efa28..aaf96cb25452 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -450,7 +450,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
 	ino = autofs4_dentry_ino(path.dentry);
 	if (ino) {
 		err = 0;
-		autofs4_expire_wait(path.dentry);
+		autofs4_expire_wait(path.dentry, 0);
 		spin_lock(&sbi->fs_lock);
 		param->requester.uid = from_kuid_munged(current_user_ns(), ino->uid);
 		param->requester.gid = from_kgid_munged(current_user_ns(), ino->gid);
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index a7be57e39be7..7e2f22ce6954 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -467,7 +467,7 @@ found:
 	return expired;
 }
 
-int autofs4_expire_wait(struct dentry *dentry)
+int autofs4_expire_wait(struct dentry *dentry, int rcu_walk)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
@@ -477,6 +477,8 @@ int autofs4_expire_wait(struct dentry *dentry)
 	spin_lock(&sbi->fs_lock);
 	if (ino->flags & AUTOFS_INF_EXPIRING) {
 		spin_unlock(&sbi->fs_lock);
+		if (rcu_walk)
+			return -ECHILD;
 
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index cdb25ebccc4c..2296c8301b66 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -210,7 +210,8 @@ next:
 	return NULL;
 }
 
-static struct dentry *autofs4_lookup_expiring(struct dentry *dentry)
+static struct dentry *autofs4_lookup_expiring(struct dentry *dentry,
+					      bool rcu_walk)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct dentry *parent = dentry->d_parent;
@@ -229,6 +230,11 @@ static struct dentry *autofs4_lookup_expiring(struct dentry *dentry)
 		struct dentry *expiring;
 		struct qstr *qstr;
 
+		if (rcu_walk) {
+			spin_unlock(&sbi->lookup_lock);
+			return ERR_PTR(-ECHILD);
+		}
+
 		ino = list_entry(p, struct autofs_info, expiring);
 		expiring = ino->dentry;
 
@@ -264,13 +270,15 @@ next:
 	return NULL;
 }
 
-static int autofs4_mount_wait(struct dentry *dentry)
+static int autofs4_mount_wait(struct dentry *dentry, bool rcu_walk)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	int status = 0;
 
 	if (ino->flags & AUTOFS_INF_PENDING) {
+		if (rcu_walk)
+			return -ECHILD;
 		DPRINTK("waiting for mount name=%.*s",
 			dentry->d_name.len, dentry->d_name.name);
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
@@ -280,20 +288,22 @@ static int autofs4_mount_wait(struct dentry *dentry)
 	return status;
 }
 
-static int do_expire_wait(struct dentry *dentry)
+static int do_expire_wait(struct dentry *dentry, bool rcu_walk)
 {
 	struct dentry *expiring;
 
-	expiring = autofs4_lookup_expiring(dentry);
+	expiring = autofs4_lookup_expiring(dentry, rcu_walk);
+	if (IS_ERR(expiring))
+		return PTR_ERR(expiring);
 	if (!expiring)
-		return autofs4_expire_wait(dentry);
+		return autofs4_expire_wait(dentry, rcu_walk);
 	else {
 		/*
 		 * If we are racing with expire the request might not
 		 * be quite complete, but the directory has been removed
 		 * so it must have been successful, just wait for it.
 		 */
-		autofs4_expire_wait(expiring);
+		autofs4_expire_wait(expiring, 0);
 		autofs4_del_expiring(expiring);
 		dput(expiring);
 	}
@@ -345,7 +355,7 @@ static struct vfsmount *autofs4_d_automount(struct path *path)
 	 * and the directory was removed, so just go ahead and try
 	 * the mount.
 	 */
-	status = do_expire_wait(dentry);
+	status = do_expire_wait(dentry, 0);
 	if (status && status != -EAGAIN)
 		return NULL;
 
@@ -353,7 +363,7 @@ static struct vfsmount *autofs4_d_automount(struct path *path)
 	spin_lock(&sbi->fs_lock);
 	if (ino->flags & AUTOFS_INF_PENDING) {
 		spin_unlock(&sbi->fs_lock);
-		status = autofs4_mount_wait(dentry);
+		status = autofs4_mount_wait(dentry, 0);
 		if (status)
 			return ERR_PTR(status);
 		goto done;
@@ -394,7 +404,7 @@ static struct vfsmount *autofs4_d_automount(struct path *path)
 		}
 		ino->flags |= AUTOFS_INF_PENDING;
 		spin_unlock(&sbi->fs_lock);
-		status = autofs4_mount_wait(dentry);
+		status = autofs4_mount_wait(dentry, 0);
 		spin_lock(&sbi->fs_lock);
 		ino->flags &= ~AUTOFS_INF_PENDING;
 		if (status) {
@@ -430,21 +440,25 @@ static int autofs4_d_manage(struct dentry *dentry, bool rcu_walk)
 		return 0;
 	}
 
-	/* We need to sleep, so we need pathwalk to be in ref-mode */
-	if (rcu_walk)
-		return -ECHILD;
-
 	/* Wait for pending expires */
-	do_expire_wait(dentry);
+	if (do_expire_wait(dentry, rcu_walk) == -ECHILD)
+		return -ECHILD;
 
 	/*
 	 * This dentry may be under construction so wait on mount
 	 * completion.
 	 */
-	status = autofs4_mount_wait(dentry);
+	status = autofs4_mount_wait(dentry, rcu_walk);
 	if (status)
 		return status;
 
+	if (rcu_walk)
+		/* it is always safe to return 0 as the worst that
+		 * will happen is we retry in REF-walk mode.
+		 * Better than always taking a lock.
+		 */
+		return 0;
+
 	spin_lock(&sbi->fs_lock);
 	/*
 	 * If the dentry has been selected for expire while we slept



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] autofs4: factor should_expire() out of autofs4_expire_indirect.
  2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
                   ` (3 preceding siblings ...)
  2014-08-18  6:33 ` [PATCH 4/5] autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode NeilBrown
@ 2014-08-18  6:33 ` NeilBrown
  2014-08-18  8:25 ` [PATCH 0/5] RCU-walk support for autofs Ian Kent
  5 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-18  6:33 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

Future patch will potentially call this twice, so make it
separate.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/autofs4/expire.c |  162 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 88 insertions(+), 74 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 7e2f22ce6954..bee939efca2b 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -345,6 +345,89 @@ out:
 	return NULL;
 }
 
+/* Check if 'dentry' should expire, or return a nearby
+ * dentry that is suitable.
+ * If returned dentry is different from arg dentry,
+ * then a dget() reference was taken, else not.
+ */
+static struct dentry *should_expire(struct dentry *dentry,
+				    struct vfsmount *mnt,
+				    unsigned long timeout,
+				    int how)
+{
+	int do_now = how & AUTOFS_EXP_IMMEDIATE;
+	int exp_leaves = how & AUTOFS_EXP_LEAVES;
+	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+	unsigned int ino_count;
+
+	/* No point expiring a pending mount */
+	if (ino->flags & AUTOFS_INF_PENDING)
+		return NULL;
+
+	/*
+	 * Case 1: (i) indirect mount or top level pseudo direct mount
+	 *	   (autofs-4.1).
+	 *	   (ii) indirect mount with offset mount, check the "/"
+	 *	   offset (autofs-5.0+).
+	 */
+	if (d_mountpoint(dentry)) {
+		DPRINTK("checking mountpoint %p %.*s",
+			dentry, (int)dentry->d_name.len, dentry->d_name.name);
+
+		/* Can we umount this guy */
+		if (autofs4_mount_busy(mnt, dentry))
+			return NULL;
+
+		/* Can we expire this guy */
+		if (autofs4_can_expire(dentry, timeout, do_now))
+			return dentry;
+		return NULL;
+	}
+
+	if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)) {
+		DPRINTK("checking symlink %p %.*s",
+			dentry, (int)dentry->d_name.len, dentry->d_name.name);
+		/*
+		 * A symlink can't be "busy" in the usual sense so
+		 * just check last used for expire timeout.
+		 */
+		if (autofs4_can_expire(dentry, timeout, do_now))
+			return dentry;
+		return NULL;
+	}
+
+	if (simple_empty(dentry))
+		return NULL;
+
+	/* Case 2: tree mount, expire iff entire tree is not busy */
+	if (!exp_leaves) {
+		/* Path walk currently on this dentry? */
+		ino_count = atomic_read(&ino->count) + 1;
+		if (d_count(dentry) > ino_count)
+			return NULL;
+
+		if (!autofs4_tree_busy(mnt, dentry, timeout, do_now))
+			return dentry;
+	/*
+	 * Case 3: pseudo direct mount, expire individual leaves
+	 *	   (autofs-4.1).
+	 */
+	} else {
+		/* Path walk currently on this dentry? */
+		struct dentry *expired;
+		ino_count = atomic_read(&ino->count) + 1;
+		if (d_count(dentry) > ino_count)
+			return NULL;
+
+		expired = autofs4_check_leaves(mnt, dentry, timeout, do_now);
+		if (expired) {
+			if (expired == dentry)
+				dput(dentry);
+			return expired;
+		}
+	}
+	return NULL;
+}
 /*
  * Find an eligible tree to time-out
  * A tree is eligible if :-
@@ -359,11 +442,8 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb,
 	unsigned long timeout;
 	struct dentry *root = sb->s_root;
 	struct dentry *dentry;
-	struct dentry *expired = NULL;
-	int do_now = how & AUTOFS_EXP_IMMEDIATE;
-	int exp_leaves = how & AUTOFS_EXP_LEAVES;
+	struct dentry *expired;
 	struct autofs_info *ino;
-	unsigned int ino_count;
 
 	if (!root)
 		return NULL;
@@ -374,78 +454,12 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb,
 	dentry = NULL;
 	while ((dentry = get_next_positive_subdir(dentry, root))) {
 		spin_lock(&sbi->fs_lock);
-		ino = autofs4_dentry_ino(dentry);
-		/* No point expiring a pending mount */
-		if (ino->flags & AUTOFS_INF_PENDING)
-			goto next;
-
-		/*
-		 * Case 1: (i) indirect mount or top level pseudo direct mount
-		 *	   (autofs-4.1).
-		 *	   (ii) indirect mount with offset mount, check the "/"
-		 *	   offset (autofs-5.0+).
-		 */
-		if (d_mountpoint(dentry)) {
-			DPRINTK("checking mountpoint %p %.*s",
-				dentry, (int)dentry->d_name.len, dentry->d_name.name);
-
-			/* Can we umount this guy */
-			if (autofs4_mount_busy(mnt, dentry))
-				goto next;
-
-			/* Can we expire this guy */
-			if (autofs4_can_expire(dentry, timeout, do_now)) {
-				expired = dentry;
-				goto found;
-			}
-			goto next;
-		}
-
-		if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)) {
-			DPRINTK("checking symlink %p %.*s",
-				dentry, (int)dentry->d_name.len, dentry->d_name.name);
-			/*
-			 * A symlink can't be "busy" in the usual sense so
-			 * just check last used for expire timeout.
-			 */
-			if (autofs4_can_expire(dentry, timeout, do_now)) {
-				expired = dentry;
-				goto found;
-			}
-			goto next;
-		}
-
-		if (simple_empty(dentry))
-			goto next;
-
-		/* Case 2: tree mount, expire iff entire tree is not busy */
-		if (!exp_leaves) {
-			/* Path walk currently on this dentry? */
-			ino_count = atomic_read(&ino->count) + 1;
-			if (d_count(dentry) > ino_count)
-				goto next;
-
-			if (!autofs4_tree_busy(mnt, dentry, timeout, do_now)) {
-				expired = dentry;
-				goto found;
-			}
-		/*
-		 * Case 3: pseudo direct mount, expire individual leaves
-		 *	   (autofs-4.1).
-		 */
-		} else {
-			/* Path walk currently on this dentry? */
-			ino_count = atomic_read(&ino->count) + 1;
-			if (d_count(dentry) > ino_count)
-				goto next;
-
-			expired = autofs4_check_leaves(mnt, dentry, timeout, do_now);
-			if (expired) {
+		expired = should_expire(dentry, mnt, timeout, how);
+		if (expired) {
+			if (expired != dentry)
 				dput(dentry);
-				goto found;
-			}
+			goto found;
 		}
-next:
 		spin_unlock(&sbi->fs_lock);
 	}
 	return NULL;



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.
  2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
                   ` (2 preceding siblings ...)
  2014-08-18  6:33 ` [PATCH 3/5] autofs4: avoid taking fs_lock during rcu-walk NeilBrown
@ 2014-08-18  6:33 ` NeilBrown
  2014-08-18  6:33 ` [PATCH 2/5] autofs4: factor should_expire() out of autofs4_expire_indirect NeilBrown
  2014-08-18  8:25 ` [PATCH 0/5] RCU-walk support for autofs Ian Kent
  5 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-18  6:33 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

If rcu-walk mode we don't *have* to return -EISDIR for non-mount-traps
as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT
dentrys the slow way.  But it is better if we do when possible.

In 'oz_mode', use the same condition as ref-walk: if not a mountpoint,
then it must be -EISDIR.

In regular mode there are most tests needed.  Most of them can be
performed without taking any spinlocks.
If we find a directory that isn't obviously empty, and isn't mounted
on, we need to call 'simple_empty()' which does take a spinlock.
If this turned out to hurt performance, some other approach could
be found to signal when a directory is known to be empty.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/autofs4/root.c |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 2296c8301b66..71e4413d65c8 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -433,8 +433,6 @@ static int autofs4_d_manage(struct dentry *dentry, bool rcu_walk)
 
 	/* The daemon never waits. */
 	if (autofs4_oz_mode(sbi)) {
-		if (rcu_walk)
-			return 0;
 		if (!d_mountpoint(dentry))
 			return -EISDIR;
 		return 0;
@@ -452,12 +450,28 @@ static int autofs4_d_manage(struct dentry *dentry, bool rcu_walk)
 	if (status)
 		return status;
 
-	if (rcu_walk)
-		/* it is always safe to return 0 as the worst that
-		 * will happen is we retry in REF-walk mode.
-		 * Better than always taking a lock.
+	if (rcu_walk) {
+		/* We don't need fs_lock in rcu_walk mode,
+		 * just testing 'AUTOFS_INFO_NO_RCU' is enough.
+		 * simple_empty() takes a spinlock, so leave it
+		 * to last.
+		 * We only return -EISDIR when certain this isn't
+		 * a mount-trap.
 		 */
+		struct inode *inode;
+		if (ino->flags & (AUTOFS_INF_EXPIRING | AUTOFS_INF_NO_RCU))
+			return 0;
+		if (d_mountpoint(dentry))
+			return 0;
+		inode = rcu_dereference(dentry->d_inode);
+		if (inode && S_ISLNK(inode->i_mode))
+			return -EISDIR;
+		if (list_empty(&dentry->d_subdirs))
+			return 0;
+		if (!simple_empty(dentry))
+			return -EISDIR;
 		return 0;
+	}
 
 	spin_lock(&sbi->fs_lock);
 	/*



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] autofs: the documentation I wanted to read
  2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
  2014-08-18  6:33 ` [PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4 NeilBrown
@ 2014-08-18  6:33 ` NeilBrown
  2014-08-18 10:13   ` Ian Kent
  2014-08-18  6:33 ` [PATCH 3/5] autofs4: avoid taking fs_lock during rcu-walk NeilBrown
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2014-08-18  6:33 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

This documents autofs from the perspective of what the module actually
supports rather than how automount is expected to use it.
It is based mostly on code review and very little on testing so it
may be inaccurate in some places.

The document assumes the functionality added by the RCU-walk patches
that I posted recently.

It is formatted using "markdown" and works best with Markdown.pl
(markdown_py doesn't like some constructs).


Copy-edited-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
---
 Documentation/filesystems/autofs4.txt |  520 +++++++++++++++++++++++++++++++++
 1 file changed, 520 insertions(+)
 create mode 100644 Documentation/filesystems/autofs4.txt

diff --git a/Documentation/filesystems/autofs4.txt b/Documentation/filesystems/autofs4.txt
new file mode 100644
index 000000000000..ae315e2768d2
--- /dev/null
+++ b/Documentation/filesystems/autofs4.txt
@@ -0,0 +1,520 @@
+<head>
+<style> p { max-width:50em} ol, ul {max-width: 40em}</style>
+</head>
+
+autofs - how it works
+=====================
+
+Purpose
+-------
+
+The goal of autofs is to provide on-demand mounting and race free
+automatic unmounting of various other filesystems.  This provides two
+key advantages:
+
+1. There is no need to delay boot until all filesystems that
+   might be needed are mounted.  Processes that try to access those
+   slow filesystems might be delayed but other processes can
+   continue freely.  This is particularly important for
+   network filesystems (e.g. NFS) or filesystems stored on
+   media with a media-changing robot.
+
+2. The names and locations of filesystems can be stored in
+   a remote database and can change at any time.  The content
+   in that data base at the time of access will be used to provide
+   a target for the access.  The interpretation of names in the
+   filesystem can even be programmatic rather than database-backed,
+   allowing wildcards for example, and can vary based on the user who
+   first accessed a name.
+
+Context
+-------
+
+The "autofs4" filesystem module is only one part of an autofs system.
+There also needs to be a user-space program which looks up names
+and mounts filesystems.  This will often be the "automount" program,
+though other tools including "systemd" can make use of "autofs4".
+This document describes only the kernel module and the interactions
+required with any user-space program.  Subsequent text refers to this
+as the "automount daemon" or simply "the daemon".
+
+"autofs4" is a Linux kernel module with provides the "autofs"
+filesystem type.  Several "autofs" filesystems can be mounted and they
+can each be managed separately, or all managed by the same daemon.
+
+Content
+-------
+
+An autofs filesystem can contain 3 sorts of objects: directories,
+symbolic links and mount traps.  Mount traps are directories with
+extra properties as described in the next section.
+
+Objects can only be created by the automount daemon: symlinks are
+created with a regular `symlink` system call, while directories and
+mount traps are created with `mkdir`.  The determination of whether a
+directory should be a mount trap or not is quite _ad hoc_, largely for
+historical reasons, and is determined in part by the
+*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
+
+If neither the *direct* or *offset* mount options are given (so the
+mount is considered to be *indirect*), then the root directory is
+always a regular directory, otherwise it is a mount trap when it is
+empty and a regular directory when not empty.  Note that *direct* and
+*offset* are treated identically so a concise summary is that the root
+directory is a mount trap only if the filesystem is mounted *direct*
+and the root is empty.
+
+Directories created in the root directory are mount traps only if the
+filesystem is mounted  *indirect* and they are empty.
+
+Directories further down the tree depend on the *maxproto* mount
+option and particularly whether it is less than five or not.
+When *maxproto* is five, no directories further down the
+tree are ever mount traps, they are always regular directories.  When
+the *maxproto* is four (or three), these directories are mount traps
+precisely when they are empty.
+
+So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
+directories are sometimes mount traps, and sometimes not depending on
+where in the tree they are (root, top level, or lower), the *maxproto*,
+and whether the mount was *indirect* or not.
+
+Mount Traps
+---------------
+
+A core element of the implementation of autofs is the Mount Traps
+which are provided by the Linux VFS.  Any directory provided by a
+filesystem can be designated as a trap.  This involves two separate
+features that work together to allow autofs to do its job.
+
+**DCACHE_NEED_AUTOMOUNT**
+
+If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
+the inode has S_AUTOMOUNT set, or can be set directly) then it is
+(potentially) a mount trap.  Any access to this directory beyond a
+"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
+to be called. The task of this method is to find the filesystem that
+should be mounted on the directory and to return it.  The VFS is
+responsible for actually mounting the root of this filesystem on the
+directory.
+
+autofs doesn't find the filesystem itself but sends a message to the
+automount daemon asking it to find and mount the filesystem.  The
+autofs `d_automount` method then waits for the daemon to report that
+everything is ready.  It will then return "`NULL`" indicating that the
+mount has already happened.  The VFS doesn't try to mount anything but
+follows down the mount that is already there.
+
+This functionality is sufficient for some users of mount traps such
+as NFS which creates traps so that mountpoints on the server can be
+reflected on the client.  However it is not sufficient for autofs.  As
+mounting onto a directory is considered to be "beyond a `stat`", the
+automount daemon would not be able to mount a filesystem on the 'trap'
+directory without some way to avoid getting caught in the trap.  For
+that purpose there is another flag.
+
+**DCACHE_MANAGE_TRANSIT**
+
+If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
+related behaviors are invoked, both using the `d_op->d_manage()`
+dentry operation.
+
+Firstly, before checking to see if any filesystem is mounted on the
+directory, d_manage() will be called with the `rcu_walk` parameter set
+to `false`.  It may return one of three things:
+
+-  A return value of zero indicates that there is nothing special
+   about this dentry and normal checks for mounts and automounts
+   should proceed.
+
+   autofs normally returns zero, but first waits for any
+   expiry (automatic unmounting of the mounted filesystem) to
+   complete.  This avoids races.
+
+-  A return value of `-EISDIR` tells the VFS to ignore any mounts
+   on the directory and to not consider calling `->d_automount()`.
+   This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag
+   causing the directory not be a mount trap after all.
+
+   autofs returns this if it detects that the process performing the
+   lookup is the automount daemon and that the mount has been
+   requested but has not yet completed.  How it determines this is
+   discussed later.  This allows the automount daemon not to get
+   caught in the mount trap.
+
+   There is a subtlety here.  It is possible that a second autofs
+   filesystem can be mounted below the first and for both of them to
+   be managed by the same daemon.  For the daemon to be able to mount
+   something on the second it must be able to "walk" down past the
+   first.  This means that d_manage cannot *always* return -EISDIR for
+   the automount daemon.  It must only return it when a mount has
+   been requested, but has not yet completed.
+
+   `d_manage` also returns `-EISDIR` if the dentry shouldn't be a
+   mount trap, either because it is a symbolic link or because it is
+   not empty.
+
+-  Any other negative value is treated as an error and returned
+   to the caller.
+
+   autofs can return
+
+   - -ENOENT if the automount daemon failed to mount anything,
+   - -ENOMEM if it ran out of memory,
+   - -EINTR if a signal arrived while waiting for expiry to
+     complete
+   - or any other error sent down by the automount daemon.
+
+
+The second use case only occurs during an "RCU-walk" and so `rcu_walk`
+will be set.
+
+An RCU-walk is a fast and lightweight process for walking down a
+filename path (i.e. it is like running on tip-toes).  RCU-walk cannot
+cope with all situations so when it finds a difficulty it falls back
+to "REF-walk", which is slower but more robust.
+
+RCU-walk will never call `->d_automount`; the filesystems must already
+be mounted or RCU-walk cannot handle the path.
+To determine if a mount-trap is safe for RCU-walk mode it calls
+`->d_manage()` with `rcu_walk` set to `true`.
+
+In this case `d_manage()` must avoid blocking and should avoid taking
+spinlocks if at all possible.  Its sole purpose is to determine if it
+would be safe to follow down into any mounted directory and the only
+reason that it might not be is if an expiry of the mount is
+underway.
+
+In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
+VFS that this is a directory that doesn't require d_automount.  If
+`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
+mounted, it *will* fall back to REF-walk.  `d_manage()` cannot make the
+VFS remain in RCU-walk mode, but can only tell it to get out of
+RCU-walk mode by returning `-ECHILD`.
+
+So `d_manage()`, when called with `rcu_walk` set, should either return
+-ECHILD if there is any reason to believe it is unsafe to end the
+mounted filesystem, and otherwise should return 0.
+
+autofs will return `-ECHILD` if an expiry of the filesystem has been
+initiated or is being considered, otherwise it returns 0.
+
+
+Mountpoint expiry
+-----------------
+
+The VFS has a mechansim for automatically expiring unused mounts,
+much as it can expire any unused dentry information from the dcache.
+This is guided by the MNT_SHRINKABLE flag.  This  only applies to
+mounts that were created by `d_automount()` returning a filesystem to be
+mounted.  As autofs doesn't return such a filesystem but leaves the
+mounting to the automount daemon, it must involve the automount daemon
+in unmounting as well.  This also means that autofs has more control
+of expiry.
+
+The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
+the `umount` system call.  Unmounting with MNT_EXPIRE will fail unless
+a previous attempt had been made, and the filesystem has been inactive
+and untouched since that previous attempt.  autofs4 does not depend on
+this but has its own internal tracking of whether filesystems were
+recently used.  This allows individual names in the autofs directory
+to expire separately.
+
+With version 4 of the protocol, the automount daemon can try to
+unmount any filesystems mounted on the autofs filesystem or remove any
+symbolic links or empty directories any time it likes.  If the unmount
+or removal is successful the filesystem will be returned to the state
+it was before the mount or creation, so that any access of the name
+will trigger normal auto-mount processing.  In particlar, `rmdir` and
+`unlink` do not leave negative entries in the dcache as a normal
+filesystem would, so an attempt to access a recently-removed object is
+passed to autofs for handling.
+
+With version 5, this is not safe except for unmounting from top-level
+directories.  As lower-level directories are never mount traps, other
+processes will see an empty directory as soon as the filesystem is
+unmounted.  So it is generally safest to use the autofs expiry
+protocol described below.
+
+Normally the daemon only wants to remove entries which haven't been
+used for a while.  For this purpose autofs maintains a "`last_used`"
+time stamp on each directory or symlink.  For symlinks it genuinely
+does record the last time the symlink was "used" or followed to find
+out where it points to.  For directories the field is a slight
+misnomer.  It actually records the last time that autofs checked if
+the directory or one of its descendents was busy and found that it
+was.  This is just as useful and doesn't require updating the field so
+often.
+
+The daemon is able to ask autofs if anything is due to be expired,
+using an `ioctl` as discussed later.  For a *direct* mount, autofs
+considers if the entire mount-tree can be unmounted or not.  For an
+*indirect* mount, autofs considers each of the names in the top level
+directory to determine if any of those can be unmounted and cleaned
+up.
+
+There is an option with indirect mounts to consider each of the leaves
+that has been mounted on instead of considering the top-level names.
+This is intended for compatability with version 4 of autofs and should
+be considered as deprecated.
+
+When autofs considers a directory it checks the `last_used` time and
+compares it with the "timeout" value set when the filesystem was
+mounted, though this check is ignored in some cases. It also checks if
+the directory or anything below it is in use.  For symbolic links,
+only the `last_used` time is ever considered.
+
+If both appear to support expiring the directory or symlink, an action
+is taken.
+
+There are two ways to ask autofs to consider expiry.  The first is to
+use the **AUTOFS_IOC_EXPIRE** ioctl.  This only works for indirect
+mounts.  If it finds something in the root directory to expire it will
+return the name of that thing.  Once a name has been returned the
+automount daemon needs to unmount any filesystems mounted below the
+name normally.  As described above, this is unsafe for non-toplevel
+mounts in a version-5 autofs.  For this reason the current `automountd`
+does not use this ioctl.
+
+The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
+the **AUTOFS_IOC_EXPIRE_MULTI** ioctl.  This will work for both direct and
+indirect mounts.  If it selects an object to expire, it will notify
+the daemon using the notification mechanism described below.  This
+will block until the daemon acknowledges the expiry notification.
+This implies that the "`EXPIRE`" ioctl must be sent from a different
+thread than the one which handles notification.
+
+While the ioctl is blocking, the entry is marked as "expiring" and
+`d_manage` will block until the daemon affirms that the unmount has
+completed (together with removing any directories that might have been
+necessary), or has been aborted.
+
+Communicating with autofs: detecting the daemon
+-----------------------------------------------
+
+There are several forms of communication between the automount daemon
+and the filesystem.  As we have already seen, the daemon can create and
+remove directories and symlinks using normal filesystem operations.
+autofs knows whether a process requesting some operation is the daemon
+or not based on its process-group id number (see getpgid(1)).
+
+When an autofs filesystem it mounted the pgid of the mounting
+processes is recorded unless the "pgrp=" option is given, in which
+case that number is recorded instead.  Any request arriving from a
+process in that process group is considered to come from the daemon.
+If the daemon ever has to be stopped and restarted a new pgid can be
+provided through an ioctl as will be described below.
+
+Communicating with autofs: the event pipe
+-----------------------------------------
+
+When an autofs filesystem is mounted, the 'write' end of a pipe must
+be passed using the 'fd=' mount option.  autofs will write
+notification messages to this pipe for the daemon to respond to.
+For version 5, the format of the message is:
+
+	struct autofs_v5_packet {
+		int proto_version;		/* Protocol version */
+		int type;			/* Type of packet */
+		autofs_wqt_t wait_queue_token;
+		__u32 dev;
+		__u64 ino;
+		__u32 uid;
+		__u32 gid;
+		__u32 pid;
+		__u32 tgid;
+		__u32 len;
+		char name[NAME_MAX+1];
+	};
+
+where the type is one of
+
+	autofs_ptype_missing_indirect
+	autofs_ptype_expire_indirect
+	autofs_ptype_missing_direct
+	autofs_ptype_expire_direct
+
+so messages can indicate that a name is missing (something tried to
+access it but it isn't there) or that it has been selected for expiry.
+
+The pipe will be set to "packet mode" (equivalent to passing
+`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
+most one packet, and any unread portion of a packet will be discarded.
+
+The `wait_queue_token` is a unique number which can identify a
+particular request to be acknowledged.  When a message is sent over
+the pipe the affected dentry is marked as either "active" or
+"expiring" and other accesses to it block until the message is
+acknowledged using one of the ioctls below and the relevant
+`wait_queue_token`.
+
+Communicating with autofs: root directory ioctls
+------------------------------------------------
+
+The root directory of an autofs filesystem will respond to a number of
+ioctls.   The process issuing the ioctl must have the CAP_SYS_ADMIN
+capability, or must be the automount daemon.
+
+The available ioctl commands are:
+
+- **AUTOFS_IOC_READY**: a notification has been handled.  The argument
+    to the ioctl command is the "wait_queue_token" number
+    corresponding to the notification being acknowledged.
+- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with
+    the error code `ENOENT`.
+- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic"
+    mode meaning that it stops sending notifications to the daemon.
+    This mode is also entered if a write to the pipe fails.
+- **AUTOFS_IOC_PROTOVER**:  This returns the protocol version in use.
+- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
+    is really a version number for the implementation.  It is
+    currently 2.
+- **AUTOFS_IOC_SETTIMEOUT**:  This passes a pointer to an unsigned
+    long.  The value is used to set the timeout for expiry, and
+    the current timeout value is stored back through the pointer.
+- **AUTOFS_IOC_ASKUMOUNT**:  Returns, in the pointed-to `int`, 1 if
+    the filesystem could be unmounted.  This is only a hint as
+    the situation could change at any instant.  This call can be
+    use to avoid a more expensive full unmount attempt.
+- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
+    anything suitable to expire.  A pointer to a packet:
+
+        struct autofs_packet_expire_multi {
+        	int proto_version;              /* Protocol version */
+        	int type;                       /* Type of packet */
+        	autofs_wqt_t wait_queue_token;
+        	int len;
+        	char name[NAME_MAX+1];
+        };
+
+     is required.  This is filled in with the name of something
+     that can be unmounted or removed.  If nothing can be expired,
+     `errno` is set to `EAGAIN`.  Even though a `wait_queue_token`
+     is present in the structure, no "wait queue" is established
+     and no acknowledgment is needed.
+- **AUTOFS_IOC_EXPIRE_MULTI**:  This is similar to
+     **AUTOFS_IOC_EXPIRE** except that it causes notification to be
+     sent to the daemon, and it blocks until the daemon acknowledges.
+     The argument is an integer which can contain two different flags.
+
+     **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
+     and objects are expired if the are not in use.
+
+     **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
+     name to expire.  This is only safe when *maxproto* is 4.
+
+Communicating with autofs: char-device ioctls
+---------------------------------------------
+
+It is not always possible to open the root of an autofs filesystem,
+particularly a *direct* mounted filesystem.  If the automount daemon
+is restarted there is no way for it to regain control of existing
+mounts using any of the above communication channels.  To address this
+need there is a "miscellaneous" character device (major 10, minor 235)
+which can be used to communicate directly with the autofs filesystem.
+It requires CAP_SYS_ADMIN for access.
+
+The `ioctl`s that can be used on this device are described in a separate
+document `autofs4-mount-control.txt`, and are summarized briefly here.
+Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
+
+	struct autofs_dev_ioctl {
+		__u32 ver_major;
+		__u32 ver_minor;
+		__u32 size;             /* total size of data passed in
+		                         * including this struct */
+		__s32 ioctlfd;          /* automount command fd */
+
+		__u32 arg1;             /* Command parameters */
+		__u32 arg2;
+
+		char path[0];
+	};
+
+For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target
+filesystem is identified by the `path`.  All other commands identify
+the filesystem by the `ioctlfd` which is a file descriptor open on the
+root, and which can be returned by **OPEN_MOUNT**.
+
+The `ver_major` and `ver_minor` are in/out parameters which check that
+the requested version is supported, and report the maximum version
+that the kernel module can support.
+
+Commands are:
+
+- **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and
+    set version numbers.
+- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor
+    on the root of an autofs filesystem.  The filesystem is identified
+    by name and device number, which is stored in `arg1`.  Device
+    numbers for existing filesystems can be found in
+    `/proc/self/mountinfo`.
+- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`.
+- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the  filesystem is in
+    catatonic mode, this can provide the write end of a new pipe
+    in `arg1` to re-establish communication with a daemon.  The
+    process group of the calling process is used to identify the
+    daemon.
+- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
+    name within the filesystem that has been auto-mounted on.
+    arg1 is the dev number of the underlying autofs.  On successful
+    return, `arg1` and `arg2` will be the UID and GID of the process
+    which triggered that mount.
+
+- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
+    mountpoint of a particular type - see separate documentation for
+    details.
+
+- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**:
+- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**:
+- **AUTOFS_DEV_IOCTL_READY_CMD**:
+- **AUTOFS_DEV_IOCTL_FAIL_CMD**:
+- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**:
+- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**:
+- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**:
+- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**:  These all have the same
+    function as the similarly named **AUTOFS_IOC** ioctls, except
+    that **FAIL** can be given an explicit error number in `arg1`
+    instead of assuming `ENOENT`, and this **EXPIRE** command
+    corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
+
+Catatonic mode
+--------------
+
+As mentioned, an autofs mount can enter "catatonic" mode.  This
+happens if a write to the notification pipe fails, or if it is
+explicitly requested by an `ioctl`.
+
+When entering catatonic mode, the pipe is closed and any pending
+notifications are acknowledged with the error `ENOENT`.
+
+Once in catatonic mode attempts to access non-existing names will
+result in `ENOENT` while attempts to access existing directories will
+be treated in the same way as if they came from the daemon, so mount
+traps will not fire.
+
+When the filesystem is mounted a _uid_ and _gid_ can be given which
+set the ownership of directories and symbolic links.  When the
+filesystem is in catatonic mode, any process with a matching UID can
+create directories or symlinks in the root directory, but not in other
+directories.
+
+Catatonic mode can only be left via the
+**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
+
+autofs, name spaces, and shared mounts
+--------------------------------------
+
+With bind mounts and name spaces it is possible for an autofs
+filesystem to appear at multiple places in one or more filesystem
+name spaces.  For this to work sensibly, the autofs filesystem should
+always be mounted "shared". e.g.
+
+> `mount --make-shared /autofs/mount/point`
+
+The automount daemon is only able to mange a single mount location for
+an autofs filesystem and if mounts on that are not 'shared', other
+locations will not behave as expected.  In particular access to those
+other locations will likely result in the `ELOOP` error
+
+> Too many levels of symbolic links



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] autofs4: avoid taking fs_lock during rcu-walk
  2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
  2014-08-18  6:33 ` [PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4 NeilBrown
  2014-08-18  6:33 ` [PATCH 5/5] autofs: the documentation I wanted to read NeilBrown
@ 2014-08-18  6:33 ` NeilBrown
  2014-08-18  6:33 ` [PATCH 4/5] autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode NeilBrown
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-18  6:33 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

->fs_lock protects AUTOFS_INF_EXPIRING.  We need to be sure
that once the flag is set, no new references beneath the dentry
are taken.  So rcu-walk currently needs to take fs_lock before
checking the flag.  This hurts performance.

Change the expiry to a two-stage process.
First set AUTOFS_INF_NO_RCU which forces any path walk into
ref-walk mode, then drop the lock and call synchronize_rcu().
Once that returns we can be sure no rcu-walk is active beneath
the dentry and we can check reference counts again.

Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without
taking the lock as along as we test AUTOFS_INF_NO_RCU too.
If either are set, we must abort the RCU-walk
If neither are set, we know that refcounts will be tested again
after we finish the RCU-walk so we are safe to continue.

->fs_lock is still taken in d_manage() to check for a non-trap
directory.  That will be resolved in the next patch.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/autofs4/autofs_i.h |    4 ++++
 fs/autofs4/expire.c   |   46 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 2f1032f12d91..8e98cf954bab 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -79,6 +79,10 @@ struct autofs_info {
 };
 
 #define AUTOFS_INF_EXPIRING	(1<<0) /* dentry is in the process of expiring */
+#define AUTOFS_INF_NO_RCU	(1<<1) /* the dentry is being considered
+					* for expiry, so RCU_walk is
+					* not permitted
+					*/
 #define AUTOFS_INF_PENDING	(1<<2) /* dentry pending mount */
 
 struct autofs_wait_queue {
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index bee939efca2b..eb4b770a4bf6 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -333,10 +333,19 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
 	if (ino->flags & AUTOFS_INF_PENDING)
 		goto out;
 	if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
-		ino->flags |= AUTOFS_INF_EXPIRING;
-		init_completion(&ino->expire_complete);
+		ino->flags |= AUTOFS_INF_NO_RCU;
 		spin_unlock(&sbi->fs_lock);
-		return root;
+		synchronize_rcu();
+		spin_lock(&sbi->fs_lock);
+		if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
+			ino->flags |= AUTOFS_INF_EXPIRING;
+			smp_mb();
+			ino->flags &= ~AUTOFS_INF_NO_RCU;
+			init_completion(&ino->expire_complete);
+			spin_unlock(&sbi->fs_lock);
+			return root;
+		}
+		ino->flags &= ~AUTOFS_INF_NO_RCU;
 	}
 out:
 	spin_unlock(&sbi->fs_lock);
@@ -454,12 +463,29 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb,
 	dentry = NULL;
 	while ((dentry = get_next_positive_subdir(dentry, root))) {
 		spin_lock(&sbi->fs_lock);
-		expired = should_expire(dentry, mnt, timeout, how);
-		if (expired) {
+		ino = autofs4_dentry_ino(dentry);
+		if (ino->flags & AUTOFS_INF_NO_RCU)
+			expired = NULL;
+		else
+			expired = should_expire(dentry, mnt, timeout, how);
+		if (!expired) {
+			spin_unlock(&sbi->fs_lock);
+			continue;
+		}
+		ino = autofs4_dentry_ino(expired);
+		ino->flags |= AUTOFS_INF_NO_RCU;
+		spin_unlock(&sbi->fs_lock);
+		synchronize_rcu();
+		spin_lock(&sbi->fs_lock);
+		if (should_expire(expired, mnt, timeout, how)) {
 			if (expired != dentry)
 				dput(dentry);
 			goto found;
 		}
+
+		ino->flags &= ~AUTOFS_INF_NO_RCU;
+		if (expired != dentry)
+			dput(expired);
 		spin_unlock(&sbi->fs_lock);
 	}
 	return NULL;
@@ -467,8 +493,9 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb,
 found:
 	DPRINTK("returning %p %.*s",
 		expired, (int)expired->d_name.len, expired->d_name.name);
-	ino = autofs4_dentry_ino(expired);
 	ino->flags |= AUTOFS_INF_EXPIRING;
+	smp_mb();
+	ino->flags &= ~AUTOFS_INF_NO_RCU;
 	init_completion(&ino->expire_complete);
 	spin_unlock(&sbi->fs_lock);
 	spin_lock(&sbi->lookup_lock);
@@ -488,11 +515,14 @@ int autofs4_expire_wait(struct dentry *dentry, int rcu_walk)
 	int status;
 
 	/* Block on any pending expire */
+	if (!(ino->flags & (AUTOFS_INF_EXPIRING | AUTOFS_INF_NO_RCU)))
+		return 0;
+	if (rcu_walk)
+		return -ECHILD;
+
 	spin_lock(&sbi->fs_lock);
 	if (ino->flags & AUTOFS_INF_EXPIRING) {
 		spin_unlock(&sbi->fs_lock);
-		if (rcu_walk)
-			return -ECHILD;
 
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
                   ` (4 preceding siblings ...)
  2014-08-18  6:33 ` [PATCH 2/5] autofs4: factor should_expire() out of autofs4_expire_indirect NeilBrown
@ 2014-08-18  8:25 ` Ian Kent
  2014-08-19 10:02   ` Ian Kent
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Kent @ 2014-08-18  8:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> Hi Ian,
>  Have you had a chance to run your tests in these patches yet?
>  I've done what testing I can think of and cannot fault them.

I haven't, I've been plagued with illness so I'm not getting nearly
enough done. I'll try to put a kernel together and run the test in the
next week or so.

> 
>  This set is against 3.17-rc1 and make use of the new -EISDIR handling
>  for d_manage() and assumes the other patches which already went in
>  through Andrew Morton.
> 
>  I've added a section to autofs4.txt about mount namespaces, but it is
>  otherwise unchanged.
> 
>  If I could get an {Acked,Reviewed,Tested}-By in the next few weeks so
>  I can send them on to Andrew I would really appreciate it.
> 
> Thanks,
> NeilBrown
> 
> 
> 
> ---
> 
> NeilBrown (5):
>       autofs4: allow RCU-walk to walk through autofs4.
>       autofs4: factor should_expire() out of autofs4_expire_indirect.
>       autofs4: avoid taking fs_lock during rcu-walk
>       autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.
>       autofs: the documentation I wanted to read
> 
> 
>  Documentation/filesystems/autofs4.txt |  520 +++++++++++++++++++++++++++++++++
>  fs/autofs4/autofs_i.h                 |    6 
>  fs/autofs4/dev-ioctl.c                |    2 
>  fs/autofs4/expire.c                   |  200 ++++++++-----
>  fs/autofs4/root.c                     |   62 +++-
>  5 files changed, 694 insertions(+), 96 deletions(-)
>  create mode 100644 Documentation/filesystems/autofs4.txt
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] autofs: the documentation I wanted to read
  2014-08-18  6:33 ` [PATCH 5/5] autofs: the documentation I wanted to read NeilBrown
@ 2014-08-18 10:13   ` Ian Kent
  2014-08-19  7:00     ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Kent @ 2014-08-18 10:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> This documents autofs from the perspective of what the module actually
> supports rather than how automount is expected to use it.
> It is based mostly on code review and very little on testing so it
> may be inaccurate in some places.
> 
> The document assumes the functionality added by the RCU-walk patches
> that I posted recently.
> 
> It is formatted using "markdown" and works best with Markdown.pl
> (markdown_py doesn't like some constructs).
> 
> 
> Copy-edited-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: NeilBrown <neilb@suse.de>
> Acked-by: Ian Kent <raven@themaw.net>
> ---
>  Documentation/filesystems/autofs4.txt |  520 +++++++++++++++++++++++++++++++++
>  1 file changed, 520 insertions(+)
>  create mode 100644 Documentation/filesystems/autofs4.txt
> 
> diff --git a/Documentation/filesystems/autofs4.txt b/Documentation/filesystems/autofs4.txt
> new file mode 100644
> index 000000000000..ae315e2768d2
> --- /dev/null
> +++ b/Documentation/filesystems/autofs4.txt
> @@ -0,0 +1,520 @@
> +<head>
> +<style> p { max-width:50em} ol, ul {max-width: 40em}</style>
> +</head>
> +
> +autofs - how it works
> +=====================
> +
> +Purpose
> +-------
> +
> +The goal of autofs is to provide on-demand mounting and race free
> +automatic unmounting of various other filesystems.  This provides two
> +key advantages:
> +
> +1. There is no need to delay boot until all filesystems that
> +   might be needed are mounted.  Processes that try to access those
> +   slow filesystems might be delayed but other processes can
> +   continue freely.  This is particularly important for
> +   network filesystems (e.g. NFS) or filesystems stored on
> +   media with a media-changing robot.
> +
> +2. The names and locations of filesystems can be stored in
> +   a remote database and can change at any time.  The content
> +   in that data base at the time of access will be used to provide
> +   a target for the access.  The interpretation of names in the
> +   filesystem can even be programmatic rather than database-backed,
> +   allowing wildcards for example, and can vary based on the user who
> +   first accessed a name.
> +
> +Context
> +-------
> +
> +The "autofs4" filesystem module is only one part of an autofs system.
> +There also needs to be a user-space program which looks up names
> +and mounts filesystems.  This will often be the "automount" program,
> +though other tools including "systemd" can make use of "autofs4".
> +This document describes only the kernel module and the interactions
> +required with any user-space program.  Subsequent text refers to this
> +as the "automount daemon" or simply "the daemon".
> +
> +"autofs4" is a Linux kernel module with provides the "autofs"
> +filesystem type.  Several "autofs" filesystems can be mounted and they
> +can each be managed separately, or all managed by the same daemon.
> +
> +Content
> +-------
> +
> +An autofs filesystem can contain 3 sorts of objects: directories,
> +symbolic links and mount traps.  Mount traps are directories with
> +extra properties as described in the next section.
> +
> +Objects can only be created by the automount daemon: symlinks are
> +created with a regular `symlink` system call, while directories and
> +mount traps are created with `mkdir`.  The determination of whether a
> +directory should be a mount trap or not is quite _ad hoc_, largely for
> +historical reasons, and is determined in part by the
> +*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
> +
> +If neither the *direct* or *offset* mount options are given (so the
> +mount is considered to be *indirect*), then the root directory is
> +always a regular directory, otherwise it is a mount trap when it is
> +empty and a regular directory when not empty.  Note that *direct* and
> +*offset* are treated identically so a concise summary is that the root
> +directory is a mount trap only if the filesystem is mounted *direct*
> +and the root is empty.
> +
> +Directories created in the root directory are mount traps only if the
> +filesystem is mounted  *indirect* and they are empty.
> +
> +Directories further down the tree depend on the *maxproto* mount
> +option and particularly whether it is less than five or not.
> +When *maxproto* is five, no directories further down the
> +tree are ever mount traps, they are always regular directories.  When
> +the *maxproto* is four (or three), these directories are mount traps
> +precisely when they are empty.
> +
> +So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
> +directories are sometimes mount traps, and sometimes not depending on
> +where in the tree they are (root, top level, or lower), the *maxproto*,
> +and whether the mount was *indirect* or not.
> +
> +Mount Traps
> +---------------
> +
> +A core element of the implementation of autofs is the Mount Traps
> +which are provided by the Linux VFS.  Any directory provided by a
> +filesystem can be designated as a trap.  This involves two separate
> +features that work together to allow autofs to do its job.
> +
> +**DCACHE_NEED_AUTOMOUNT**
> +
> +If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
> +the inode has S_AUTOMOUNT set, or can be set directly) then it is
> +(potentially) a mount trap.  Any access to this directory beyond a
> +"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
> +to be called. The task of this method is to find the filesystem that
> +should be mounted on the directory and to return it.  The VFS is
> +responsible for actually mounting the root of this filesystem on the
> +directory.
> +
> +autofs doesn't find the filesystem itself but sends a message to the
> +automount daemon asking it to find and mount the filesystem.  The
> +autofs `d_automount` method then waits for the daemon to report that
> +everything is ready.  It will then return "`NULL`" indicating that the
> +mount has already happened.  The VFS doesn't try to mount anything but
> +follows down the mount that is already there.
> +
> +This functionality is sufficient for some users of mount traps such
> +as NFS which creates traps so that mountpoints on the server can be
> +reflected on the client.  However it is not sufficient for autofs.  As
> +mounting onto a directory is considered to be "beyond a `stat`", the
> +automount daemon would not be able to mount a filesystem on the 'trap'
> +directory without some way to avoid getting caught in the trap.  For
> +that purpose there is another flag.
> +
> +**DCACHE_MANAGE_TRANSIT**
> +
> +If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
> +related behaviors are invoked, both using the `d_op->d_manage()`
> +dentry operation.
> +
> +Firstly, before checking to see if any filesystem is mounted on the
> +directory, d_manage() will be called with the `rcu_walk` parameter set
> +to `false`.  It may return one of three things:
> +
> +-  A return value of zero indicates that there is nothing special
> +   about this dentry and normal checks for mounts and automounts
> +   should proceed.
> +
> +   autofs normally returns zero, but first waits for any
> +   expiry (automatic unmounting of the mounted filesystem) to
> +   complete.  This avoids races.
> +
> +-  A return value of `-EISDIR` tells the VFS to ignore any mounts
> +   on the directory and to not consider calling `->d_automount()`.
> +   This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag
> +   causing the directory not be a mount trap after all.
> +
> +   autofs returns this if it detects that the process performing the
> +   lookup is the automount daemon and that the mount has been
> +   requested but has not yet completed.  How it determines this is
> +   discussed later.  This allows the automount daemon not to get
> +   caught in the mount trap.
> +
> +   There is a subtlety here.  It is possible that a second autofs
> +   filesystem can be mounted below the first and for both of them to
> +   be managed by the same daemon.  For the daemon to be able to mount
> +   something on the second it must be able to "walk" down past the
> +   first.  This means that d_manage cannot *always* return -EISDIR for
> +   the automount daemon.  It must only return it when a mount has
> +   been requested, but has not yet completed.
> +
> +   `d_manage` also returns `-EISDIR` if the dentry shouldn't be a
> +   mount trap, either because it is a symbolic link or because it is
> +   not empty.
> +
> +-  Any other negative value is treated as an error and returned
> +   to the caller.
> +
> +   autofs can return
> +
> +   - -ENOENT if the automount daemon failed to mount anything,
> +   - -ENOMEM if it ran out of memory,
> +   - -EINTR if a signal arrived while waiting for expiry to
> +     complete
> +   - or any other error sent down by the automount daemon.
> +
> +
> +The second use case only occurs during an "RCU-walk" and so `rcu_walk`
> +will be set.
> +
> +An RCU-walk is a fast and lightweight process for walking down a
> +filename path (i.e. it is like running on tip-toes).  RCU-walk cannot
> +cope with all situations so when it finds a difficulty it falls back
> +to "REF-walk", which is slower but more robust.
> +
> +RCU-walk will never call `->d_automount`; the filesystems must already
> +be mounted or RCU-walk cannot handle the path.
> +To determine if a mount-trap is safe for RCU-walk mode it calls
> +`->d_manage()` with `rcu_walk` set to `true`.
> +
> +In this case `d_manage()` must avoid blocking and should avoid taking
> +spinlocks if at all possible.  Its sole purpose is to determine if it
> +would be safe to follow down into any mounted directory and the only
> +reason that it might not be is if an expiry of the mount is
> +underway.
> +
> +In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
> +VFS that this is a directory that doesn't require d_automount.  If
> +`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
> +mounted, it *will* fall back to REF-walk.  `d_manage()` cannot make the
> +VFS remain in RCU-walk mode, but can only tell it to get out of
> +RCU-walk mode by returning `-ECHILD`.
> +
> +So `d_manage()`, when called with `rcu_walk` set, should either return
> +-ECHILD if there is any reason to believe it is unsafe to end the
> +mounted filesystem, and otherwise should return 0.
> +
> +autofs will return `-ECHILD` if an expiry of the filesystem has been
> +initiated or is being considered, otherwise it returns 0.
> +
> +
> +Mountpoint expiry
> +-----------------
> +
> +The VFS has a mechansim for automatically expiring unused mounts,
> +much as it can expire any unused dentry information from the dcache.
> +This is guided by the MNT_SHRINKABLE flag.  This  only applies to
> +mounts that were created by `d_automount()` returning a filesystem to be
> +mounted.  As autofs doesn't return such a filesystem but leaves the
> +mounting to the automount daemon, it must involve the automount daemon
> +in unmounting as well.  This also means that autofs has more control
> +of expiry.
> +
> +The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
> +the `umount` system call.  Unmounting with MNT_EXPIRE will fail unless
> +a previous attempt had been made, and the filesystem has been inactive
> +and untouched since that previous attempt.  autofs4 does not depend on
> +this but has its own internal tracking of whether filesystems were
> +recently used.  This allows individual names in the autofs directory
> +to expire separately.
> +
> +With version 4 of the protocol, the automount daemon can try to
> +unmount any filesystems mounted on the autofs filesystem or remove any
> +symbolic links or empty directories any time it likes.  If the unmount
> +or removal is successful the filesystem will be returned to the state
> +it was before the mount or creation, so that any access of the name
> +will trigger normal auto-mount processing.  In particlar, `rmdir` and
> +`unlink` do not leave negative entries in the dcache as a normal
> +filesystem would, so an attempt to access a recently-removed object is
> +passed to autofs for handling.
> +
> +With version 5, this is not safe except for unmounting from top-level
> +directories.  As lower-level directories are never mount traps, other
> +processes will see an empty directory as soon as the filesystem is
> +unmounted.  So it is generally safest to use the autofs expiry
> +protocol described below.
> +
> +Normally the daemon only wants to remove entries which haven't been
> +used for a while.  For this purpose autofs maintains a "`last_used`"
> +time stamp on each directory or symlink.  For symlinks it genuinely
> +does record the last time the symlink was "used" or followed to find
> +out where it points to.  For directories the field is a slight
> +misnomer.  It actually records the last time that autofs checked if
> +the directory or one of its descendents was busy and found that it
> +was.  This is just as useful and doesn't require updating the field so
> +often.
> +
> +The daemon is able to ask autofs if anything is due to be expired,
> +using an `ioctl` as discussed later.  For a *direct* mount, autofs
> +considers if the entire mount-tree can be unmounted or not.  For an
> +*indirect* mount, autofs considers each of the names in the top level
> +directory to determine if any of those can be unmounted and cleaned
> +up.
> +
> +There is an option with indirect mounts to consider each of the leaves
> +that has been mounted on instead of considering the top-level names.
> +This is intended for compatability with version 4 of autofs and should
> +be considered as deprecated.
> +
> +When autofs considers a directory it checks the `last_used` time and
> +compares it with the "timeout" value set when the filesystem was
> +mounted, though this check is ignored in some cases. It also checks if
> +the directory or anything below it is in use.  For symbolic links,
> +only the `last_used` time is ever considered.
> +
> +If both appear to support expiring the directory or symlink, an action
> +is taken.
> +
> +There are two ways to ask autofs to consider expiry.  The first is to
> +use the **AUTOFS_IOC_EXPIRE** ioctl.  This only works for indirect
> +mounts.  If it finds something in the root directory to expire it will
> +return the name of that thing.  Once a name has been returned the
> +automount daemon needs to unmount any filesystems mounted below the
> +name normally.  As described above, this is unsafe for non-toplevel
> +mounts in a version-5 autofs.  For this reason the current `automountd`
> +does not use this ioctl.
> +
> +The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
> +the **AUTOFS_IOC_EXPIRE_MULTI** ioctl.  This will work for both direct and
> +indirect mounts.  If it selects an object to expire, it will notify
> +the daemon using the notification mechanism described below.  This
> +will block until the daemon acknowledges the expiry notification.
> +This implies that the "`EXPIRE`" ioctl must be sent from a different
> +thread than the one which handles notification.
> +
> +While the ioctl is blocking, the entry is marked as "expiring" and
> +`d_manage` will block until the daemon affirms that the unmount has
> +completed (together with removing any directories that might have been
> +necessary), or has been aborted.
> +
> +Communicating with autofs: detecting the daemon
> +-----------------------------------------------
> +
> +There are several forms of communication between the automount daemon
> +and the filesystem.  As we have already seen, the daemon can create and
> +remove directories and symlinks using normal filesystem operations.
> +autofs knows whether a process requesting some operation is the daemon
> +or not based on its process-group id number (see getpgid(1)).
> +
> +When an autofs filesystem it mounted the pgid of the mounting
> +processes is recorded unless the "pgrp=" option is given, in which
> +case that number is recorded instead.  Any request arriving from a
> +process in that process group is considered to come from the daemon.
> +If the daemon ever has to be stopped and restarted a new pgid can be
> +provided through an ioctl as will be described below.
> +
> +Communicating with autofs: the event pipe
> +-----------------------------------------
> +
> +When an autofs filesystem is mounted, the 'write' end of a pipe must
> +be passed using the 'fd=' mount option.  autofs will write
> +notification messages to this pipe for the daemon to respond to.
> +For version 5, the format of the message is:
> +
> +	struct autofs_v5_packet {
> +		int proto_version;		/* Protocol version */
> +		int type;			/* Type of packet */
> +		autofs_wqt_t wait_queue_token;
> +		__u32 dev;
> +		__u64 ino;
> +		__u32 uid;
> +		__u32 gid;
> +		__u32 pid;
> +		__u32 tgid;
> +		__u32 len;
> +		char name[NAME_MAX+1];
> +	};
> +
> +where the type is one of
> +
> +	autofs_ptype_missing_indirect
> +	autofs_ptype_expire_indirect
> +	autofs_ptype_missing_direct
> +	autofs_ptype_expire_direct
> +
> +so messages can indicate that a name is missing (something tried to
> +access it but it isn't there) or that it has been selected for expiry.
> +
> +The pipe will be set to "packet mode" (equivalent to passing
> +`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
> +most one packet, and any unread portion of a packet will be discarded.
> +
> +The `wait_queue_token` is a unique number which can identify a
> +particular request to be acknowledged.  When a message is sent over
> +the pipe the affected dentry is marked as either "active" or
> +"expiring" and other accesses to it block until the message is
> +acknowledged using one of the ioctls below and the relevant
> +`wait_queue_token`.
> +
> +Communicating with autofs: root directory ioctls
> +------------------------------------------------
> +
> +The root directory of an autofs filesystem will respond to a number of
> +ioctls.   The process issuing the ioctl must have the CAP_SYS_ADMIN
> +capability, or must be the automount daemon.
> +
> +The available ioctl commands are:
> +
> +- **AUTOFS_IOC_READY**: a notification has been handled.  The argument
> +    to the ioctl command is the "wait_queue_token" number
> +    corresponding to the notification being acknowledged.
> +- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with
> +    the error code `ENOENT`.
> +- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic"
> +    mode meaning that it stops sending notifications to the daemon.
> +    This mode is also entered if a write to the pipe fails.
> +- **AUTOFS_IOC_PROTOVER**:  This returns the protocol version in use.
> +- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
> +    is really a version number for the implementation.  It is
> +    currently 2.
> +- **AUTOFS_IOC_SETTIMEOUT**:  This passes a pointer to an unsigned
> +    long.  The value is used to set the timeout for expiry, and
> +    the current timeout value is stored back through the pointer.
> +- **AUTOFS_IOC_ASKUMOUNT**:  Returns, in the pointed-to `int`, 1 if
> +    the filesystem could be unmounted.  This is only a hint as
> +    the situation could change at any instant.  This call can be
> +    use to avoid a more expensive full unmount attempt.
> +- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
> +    anything suitable to expire.  A pointer to a packet:
> +
> +        struct autofs_packet_expire_multi {
> +        	int proto_version;              /* Protocol version */
> +        	int type;                       /* Type of packet */
> +        	autofs_wqt_t wait_queue_token;
> +        	int len;
> +        	char name[NAME_MAX+1];

These fields have leading <spaces><tab> StGit complains about on import.
Think it should be <tab><tab>.

> +        };
> +
> +     is required.  This is filled in with the name of something
> +     that can be unmounted or removed.  If nothing can be expired,
> +     `errno` is set to `EAGAIN`.  Even though a `wait_queue_token`
> +     is present in the structure, no "wait queue" is established
> +     and no acknowledgment is needed.
> +- **AUTOFS_IOC_EXPIRE_MULTI**:  This is similar to
> +     **AUTOFS_IOC_EXPIRE** except that it causes notification to be
> +     sent to the daemon, and it blocks until the daemon acknowledges.
> +     The argument is an integer which can contain two different flags.
> +
> +     **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
> +     and objects are expired if the are not in use.
> +
> +     **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
> +     name to expire.  This is only safe when *maxproto* is 4.
> +
> +Communicating with autofs: char-device ioctls
> +---------------------------------------------
> +
> +It is not always possible to open the root of an autofs filesystem,
> +particularly a *direct* mounted filesystem.  If the automount daemon
> +is restarted there is no way for it to regain control of existing
> +mounts using any of the above communication channels.  To address this
> +need there is a "miscellaneous" character device (major 10, minor 235)
> +which can be used to communicate directly with the autofs filesystem.
> +It requires CAP_SYS_ADMIN for access.
> +
> +The `ioctl`s that can be used on this device are described in a separate
> +document `autofs4-mount-control.txt`, and are summarized briefly here.
> +Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
> +
> +	struct autofs_dev_ioctl {
> +		__u32 ver_major;
> +		__u32 ver_minor;
> +		__u32 size;             /* total size of data passed in
> +		                         * including this struct */
> +		__s32 ioctlfd;          /* automount command fd */
> +
> +		__u32 arg1;             /* Command parameters */
> +		__u32 arg2;
> +
> +		char path[0];
> +	};
> +
> +For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target
> +filesystem is identified by the `path`.  All other commands identify
> +the filesystem by the `ioctlfd` which is a file descriptor open on the
> +root, and which can be returned by **OPEN_MOUNT**.
> +
> +The `ver_major` and `ver_minor` are in/out parameters which check that
> +the requested version is supported, and report the maximum version
> +that the kernel module can support.
> +
> +Commands are:
> +
> +- **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and
> +    set version numbers.
> +- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor
> +    on the root of an autofs filesystem.  The filesystem is identified
> +    by name and device number, which is stored in `arg1`.  Device
> +    numbers for existing filesystems can be found in
> +    `/proc/self/mountinfo`.
> +- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`.
> +- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the  filesystem is in
> +    catatonic mode, this can provide the write end of a new pipe
> +    in `arg1` to re-establish communication with a daemon.  The
> +    process group of the calling process is used to identify the
> +    daemon.
> +- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
> +    name within the filesystem that has been auto-mounted on.
> +    arg1 is the dev number of the underlying autofs.  On successful
> +    return, `arg1` and `arg2` will be the UID and GID of the process
> +    which triggered that mount.
> +
> +- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
> +    mountpoint of a particular type - see separate documentation for
> +    details.
> +
> +- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**:
> +- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**:
> +- **AUTOFS_DEV_IOCTL_READY_CMD**:
> +- **AUTOFS_DEV_IOCTL_FAIL_CMD**:
> +- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**:
> +- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**:
> +- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**:
> +- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**:  These all have the same
> +    function as the similarly named **AUTOFS_IOC** ioctls, except
> +    that **FAIL** can be given an explicit error number in `arg1`
> +    instead of assuming `ENOENT`, and this **EXPIRE** command
> +    corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
> +
> +Catatonic mode
> +--------------
> +
> +As mentioned, an autofs mount can enter "catatonic" mode.  This
> +happens if a write to the notification pipe fails, or if it is
> +explicitly requested by an `ioctl`.
> +
> +When entering catatonic mode, the pipe is closed and any pending
> +notifications are acknowledged with the error `ENOENT`.
> +
> +Once in catatonic mode attempts to access non-existing names will
> +result in `ENOENT` while attempts to access existing directories will
> +be treated in the same way as if they came from the daemon, so mount
> +traps will not fire.
> +
> +When the filesystem is mounted a _uid_ and _gid_ can be given which
> +set the ownership of directories and symbolic links.  When the
> +filesystem is in catatonic mode, any process with a matching UID can
> +create directories or symlinks in the root directory, but not in other
> +directories.
> +
> +Catatonic mode can only be left via the
> +**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
> +
> +autofs, name spaces, and shared mounts
> +--------------------------------------
> +
> +With bind mounts and name spaces it is possible for an autofs
> +filesystem to appear at multiple places in one or more filesystem
> +name spaces.  For this to work sensibly, the autofs filesystem should
> +always be mounted "shared". e.g.
> +
> +> `mount --make-shared /autofs/mount/point`
> +
> +The automount daemon is only able to mange a single mount location for
> +an autofs filesystem and if mounts on that are not 'shared', other
> +locations will not behave as expected.  In particular access to those
> +other locations will likely result in the `ELOOP` error
> +
> +> Too many levels of symbolic links
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] autofs: the documentation I wanted to read
  2014-08-18 10:13   ` Ian Kent
@ 2014-08-19  7:00     ` NeilBrown
  0 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-19  7:00 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

On Mon, 18 Aug 2014 18:13:52 +0800 Ian Kent <raven@themaw.net> wrote:


> > +The available ioctl commands are:
> > +
> > +- **AUTOFS_IOC_READY**: a notification has been handled.  The argument
> > +    to the ioctl command is the "wait_queue_token" number
> > +    corresponding to the notification being acknowledged.
> > +- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with
> > +    the error code `ENOENT`.
> > +- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic"
> > +    mode meaning that it stops sending notifications to the daemon.
> > +    This mode is also entered if a write to the pipe fails.
> > +- **AUTOFS_IOC_PROTOVER**:  This returns the protocol version in use.
> > +- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
> > +    is really a version number for the implementation.  It is
> > +    currently 2.
> > +- **AUTOFS_IOC_SETTIMEOUT**:  This passes a pointer to an unsigned
> > +    long.  The value is used to set the timeout for expiry, and
> > +    the current timeout value is stored back through the pointer.
> > +- **AUTOFS_IOC_ASKUMOUNT**:  Returns, in the pointed-to `int`, 1 if
> > +    the filesystem could be unmounted.  This is only a hint as
> > +    the situation could change at any instant.  This call can be
> > +    use to avoid a more expensive full unmount attempt.
> > +- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
> > +    anything suitable to expire.  A pointer to a packet:
> > +
> > +        struct autofs_packet_expire_multi {
> > +        	int proto_version;              /* Protocol version */
> > +        	int type;                       /* Type of packet */
> > +        	autofs_wqt_t wait_queue_token;
> > +        	int len;
> > +        	char name[NAME_MAX+1];
> 
> These fields have leading <spaces><tab> StGit complains about on import.
> Think it should be <tab><tab>.
> 

I think I did that because I thought it would work better for markdown
processors (e.g. markdown_py).
However I just tested, and it works best if I turn all the tabs to spaces.
So that is what I have done.

And no rush with testing etc - I just wanted to make sure you had current
code now that -rc1 was out.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-18  8:25 ` [PATCH 0/5] RCU-walk support for autofs Ian Kent
@ 2014-08-19 10:02   ` Ian Kent
  2014-08-19 11:16     ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Kent @ 2014-08-19 10:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Mon, 2014-08-18 at 16:25 +0800, Ian Kent wrote:
> On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> > Hi Ian,
> >  Have you had a chance to run your tests in these patches yet?
> >  I've done what testing I can think of and cannot fault them.
> 
> I haven't, I've been plagued with illness so I'm not getting nearly
> enough done. I'll try to put a kernel together and run the test in the
> next week or so.

Just to let you know that I managed to spend some time on this. I built
a kernel (3.17.0-rc1) with the series and ran a couple of tests.

I'm not certain that the patches I used are identical to your posting, I
saw one difference, after the fact, that shouldn't matter, I'll have to
check that.

It isn't possible to test expire to mount races because the mounts in
the tree never expire.

At first I thought it was because so many processes were accessing the
tree all the time but manually constructing the maps and mounting the
mounts shows that nothing ever expires, at least for this tree.

However, issuing a shut down does expire all the mounts and shuts down
autofs cleanly.

So there is something not quite right with the expire check or my
patches have mistakes.

> 
> > 
> >  This set is against 3.17-rc1 and make use of the new -EISDIR handling
> >  for d_manage() and assumes the other patches which already went in
> >  through Andrew Morton.
> > 
> >  I've added a section to autofs4.txt about mount namespaces, but it is
> >  otherwise unchanged.
> > 
> >  If I could get an {Acked,Reviewed,Tested}-By in the next few weeks so
> >  I can send them on to Andrew I would really appreciate it.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > 
> > ---
> > 
> > NeilBrown (5):
> >       autofs4: allow RCU-walk to walk through autofs4.
> >       autofs4: factor should_expire() out of autofs4_expire_indirect.
> >       autofs4: avoid taking fs_lock during rcu-walk
> >       autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.
> >       autofs: the documentation I wanted to read
> > 
> > 
> >  Documentation/filesystems/autofs4.txt |  520 +++++++++++++++++++++++++++++++++
> >  fs/autofs4/autofs_i.h                 |    6 
> >  fs/autofs4/dev-ioctl.c                |    2 
> >  fs/autofs4/expire.c                   |  200 ++++++++-----
> >  fs/autofs4/root.c                     |   62 +++-
> >  5 files changed, 694 insertions(+), 96 deletions(-)
> >  create mode 100644 Documentation/filesystems/autofs4.txt
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe autofs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-19 10:02   ` Ian Kent
@ 2014-08-19 11:16     ` NeilBrown
  2014-08-19 12:30       ` Ian Kent
  2014-08-19 12:36       ` Ian Kent
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2014-08-19 11:16 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On Tue, 19 Aug 2014 18:02:27 +0800 Ian Kent <raven@themaw.net> wrote:

> On Mon, 2014-08-18 at 16:25 +0800, Ian Kent wrote:
> > On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> > > Hi Ian,
> > >  Have you had a chance to run your tests in these patches yet?
> > >  I've done what testing I can think of and cannot fault them.
> > 
> > I haven't, I've been plagued with illness so I'm not getting nearly
> > enough done. I'll try to put a kernel together and run the test in the
> > next week or so.
> 
> Just to let you know that I managed to spend some time on this. I built
> a kernel (3.17.0-rc1) with the series and ran a couple of tests.
> 
> I'm not certain that the patches I used are identical to your posting, I
> saw one difference, after the fact, that shouldn't matter, I'll have to
> check that.
> 
> It isn't possible to test expire to mount races because the mounts in
> the tree never expire.
> 
> At first I thought it was because so many processes were accessing the
> tree all the time but manually constructing the maps and mounting the
> mounts shows that nothing ever expires, at least for this tree.
> 
> However, issuing a shut down does expire all the mounts and shuts down
> autofs cleanly.
> 
> So there is something not quite right with the expire check or my
> patches have mistakes.

Ahh.. I bet I know what it is.
autofs4_can_expire() isn't idempotent.
Because we call should_expire twice, autofs4_can_expire() is called twice and
the second time it failed because  the first time it resets ->last_used.

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index eb4b770a4bf6..80133a9d9427 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -26,6 +26,9 @@ static inline int autofs4_can_expire(struct dentry *dentry,
 	if (ino == NULL)
 		return 0;
 
+	if (ino->flags & AUTOFS_INF_NO_RCU)
+		/* Already performed this test */
+		return 1;
 	if (!do_now) {
 		/* Too young to die */
 		if (!timeout || time_after(ino->last_used + timeout, now))



might fix it, but might be a hack.
(tests.... yep, that seems to fix it).

I'll think some more tomorrow.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-19 11:16     ` NeilBrown
@ 2014-08-19 12:30       ` Ian Kent
  2014-08-19 12:36       ` Ian Kent
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Kent @ 2014-08-19 12:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Tue, 2014-08-19 at 21:16 +1000, NeilBrown wrote:
> On Tue, 19 Aug 2014 18:02:27 +0800 Ian Kent <raven@themaw.net> wrote:
> 
> > On Mon, 2014-08-18 at 16:25 +0800, Ian Kent wrote:
> > > On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> > > > Hi Ian,
> > > >  Have you had a chance to run your tests in these patches yet?
> > > >  I've done what testing I can think of and cannot fault them.
> > > 
> > > I haven't, I've been plagued with illness so I'm not getting nearly
> > > enough done. I'll try to put a kernel together and run the test in the
> > > next week or so.
> > 
> > Just to let you know that I managed to spend some time on this. I built
> > a kernel (3.17.0-rc1) with the series and ran a couple of tests.
> > 
> > I'm not certain that the patches I used are identical to your posting, I
> > saw one difference, after the fact, that shouldn't matter, I'll have to
> > check that.
> > 
> > It isn't possible to test expire to mount races because the mounts in
> > the tree never expire.
> > 
> > At first I thought it was because so many processes were accessing the
> > tree all the time but manually constructing the maps and mounting the
> > mounts shows that nothing ever expires, at least for this tree.
> > 
> > However, issuing a shut down does expire all the mounts and shuts down
> > autofs cleanly.
> > 
> > So there is something not quite right with the expire check or my
> > patches have mistakes.
> 
> Ahh.. I bet I know what it is.
> autofs4_can_expire() isn't idempotent.
> Because we call should_expire twice, autofs4_can_expire() is called twice and
> the second time it failed because  the first time it resets ->last_used.

LOL, indeed that looks like it.
I'll carefully check the patches anyway, rebuild and re-test.

> 
> diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> index eb4b770a4bf6..80133a9d9427 100644
> --- a/fs/autofs4/expire.c
> +++ b/fs/autofs4/expire.c
> @@ -26,6 +26,9 @@ static inline int autofs4_can_expire(struct dentry *dentry,
>  	if (ino == NULL)
>  		return 0;
>  
> +	if (ino->flags & AUTOFS_INF_NO_RCU)
> +		/* Already performed this test */
> +		return 1;
>  	if (!do_now) {
>  		/* Too young to die */
>  		if (!timeout || time_after(ino->last_used + timeout, now))
> 
> 
> 
> might fix it, but might be a hack.
> (tests.... yep, that seems to fix it).
> 
> I'll think some more tomorrow.
> 
> Thanks,
> NeilBrown



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-19 11:16     ` NeilBrown
  2014-08-19 12:30       ` Ian Kent
@ 2014-08-19 12:36       ` Ian Kent
  2014-08-20  3:13         ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Kent @ 2014-08-19 12:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Tue, 2014-08-19 at 21:16 +1000, NeilBrown wrote:
> On Tue, 19 Aug 2014 18:02:27 +0800 Ian Kent <raven@themaw.net> wrote:
> 
> > On Mon, 2014-08-18 at 16:25 +0800, Ian Kent wrote:
> > > On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> > > > Hi Ian,
> > > >  Have you had a chance to run your tests in these patches yet?
> > > >  I've done what testing I can think of and cannot fault them.
> > > 
> > > I haven't, I've been plagued with illness so I'm not getting nearly
> > > enough done. I'll try to put a kernel together and run the test in the
> > > next week or so.
> > 
> > Just to let you know that I managed to spend some time on this. I built
> > a kernel (3.17.0-rc1) with the series and ran a couple of tests.
> > 
> > I'm not certain that the patches I used are identical to your posting, I
> > saw one difference, after the fact, that shouldn't matter, I'll have to
> > check that.
> > 
> > It isn't possible to test expire to mount races because the mounts in
> > the tree never expire.
> > 
> > At first I thought it was because so many processes were accessing the
> > tree all the time but manually constructing the maps and mounting the
> > mounts shows that nothing ever expires, at least for this tree.
> > 
> > However, issuing a shut down does expire all the mounts and shuts down
> > autofs cleanly.
> > 
> > So there is something not quite right with the expire check or my
> > patches have mistakes.
> 
> Ahh.. I bet I know what it is.
> autofs4_can_expire() isn't idempotent.
> Because we call should_expire twice, autofs4_can_expire() is called twice and
> the second time it failed because  the first time it resets ->last_used.
> 
> diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> index eb4b770a4bf6..80133a9d9427 100644
> --- a/fs/autofs4/expire.c
> +++ b/fs/autofs4/expire.c
> @@ -26,6 +26,9 @@ static inline int autofs4_can_expire(struct dentry *dentry,
>  	if (ino == NULL)
>  		return 0;
>  
> +	if (ino->flags & AUTOFS_INF_NO_RCU)
> +		/* Already performed this test */
> +		return 1;

Wouldn't it be better to perform this check, or similar, further down
where the last_used time is updated? After all it's only updated to
prevent rapid fire expires on dentrys that refuse to umount for some
reason.

>  	if (!do_now) {
>  		/* Too young to die */
>  		if (!timeout || time_after(ino->last_used + timeout, now))
> 
> 
> 
> might fix it, but might be a hack.
> (tests.... yep, that seems to fix it).
> 
> I'll think some more tomorrow.
> 
> Thanks,
> NeilBrown



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-19 12:36       ` Ian Kent
@ 2014-08-20  3:13         ` NeilBrown
  2014-08-20  3:42           ` Ian Kent
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2014-08-20  3:13 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4640 bytes --]

On Tue, 19 Aug 2014 20:36:55 +0800 Ian Kent <raven@themaw.net> wrote:

> On Tue, 2014-08-19 at 21:16 +1000, NeilBrown wrote:
> > On Tue, 19 Aug 2014 18:02:27 +0800 Ian Kent <raven@themaw.net> wrote:
> > 
> > > On Mon, 2014-08-18 at 16:25 +0800, Ian Kent wrote:
> > > > On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> > > > > Hi Ian,
> > > > >  Have you had a chance to run your tests in these patches yet?
> > > > >  I've done what testing I can think of and cannot fault them.
> > > > 
> > > > I haven't, I've been plagued with illness so I'm not getting nearly
> > > > enough done. I'll try to put a kernel together and run the test in the
> > > > next week or so.
> > > 
> > > Just to let you know that I managed to spend some time on this. I built
> > > a kernel (3.17.0-rc1) with the series and ran a couple of tests.
> > > 
> > > I'm not certain that the patches I used are identical to your posting, I
> > > saw one difference, after the fact, that shouldn't matter, I'll have to
> > > check that.
> > > 
> > > It isn't possible to test expire to mount races because the mounts in
> > > the tree never expire.
> > > 
> > > At first I thought it was because so many processes were accessing the
> > > tree all the time but manually constructing the maps and mounting the
> > > mounts shows that nothing ever expires, at least for this tree.
> > > 
> > > However, issuing a shut down does expire all the mounts and shuts down
> > > autofs cleanly.
> > > 
> > > So there is something not quite right with the expire check or my
> > > patches have mistakes.
> > 
> > Ahh.. I bet I know what it is.
> > autofs4_can_expire() isn't idempotent.
> > Because we call should_expire twice, autofs4_can_expire() is called twice and
> > the second time it failed because  the first time it resets ->last_used.
> > 
> > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > index eb4b770a4bf6..80133a9d9427 100644
> > --- a/fs/autofs4/expire.c
> > +++ b/fs/autofs4/expire.c
> > @@ -26,6 +26,9 @@ static inline int autofs4_can_expire(struct dentry *dentry,
> >  	if (ino == NULL)
> >  		return 0;
> >  
> > +	if (ino->flags & AUTOFS_INF_NO_RCU)
> > +		/* Already performed this test */
> > +		return 1;
> 
> Wouldn't it be better to perform this check, or similar, further down
> where the last_used time is updated? After all it's only updated to
> prevent rapid fire expires on dentrys that refuse to umount for some
> reason.
> 

How about the follow approach instead?
Though I must admit that I find the 'now' global variable feels a bit
clumsy...
Is there a reason for not just using "jiffies" everywhere?

I've created a tag 'autofs4' in git://neil.brown.name/md/ which has this in
with all the others, in case that is useful.

Thanks,
NeilBrown



From 201f75bc25906e8f64e28b37f1bb478958bf2987 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 20 Aug 2014 12:40:06 +1000
Subject: [PATCH] autofs4: make "autofs4_can_expire" idempotent.

Have a "test" function change the value it is testing can
be confusing, particularly as a future patch will be calling
this function twice.

So move the update for 'last_used' to avoid repeat expiry
to the place where the final determination on what to expire is known.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index bee939efca2b..af09dada91bc 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -30,12 +30,6 @@ static inline int autofs4_can_expire(struct dentry *dentry,
 		/* Too young to die */
 		if (!timeout || time_after(ino->last_used + timeout, now))
 			return 0;
-
-		/* update last_used here :-
-		   - obviously makes sense if it is in use now
-		   - less obviously, prevents rapid-fire expire
-		     attempts if expire fails the first time */
-		ino->last_used = now;
 	}
 	return 1;
 }
@@ -541,6 +535,8 @@ int autofs4_expire_run(struct super_block *sb,
 
 	spin_lock(&sbi->fs_lock);
 	ino = autofs4_dentry_ino(dentry);
+	/* avoid rapid-fire expire attempts if expiry fails */
+	ino->last_used = now;
 	ino->flags &= ~AUTOFS_INF_EXPIRING;
 	complete_all(&ino->expire_complete);
 	spin_unlock(&sbi->fs_lock);
@@ -567,6 +563,8 @@ int autofs4_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
 		ret = autofs4_wait(sbi, dentry, NFY_EXPIRE);
 
 		spin_lock(&sbi->fs_lock);
+		/* avoid rapid-fire expire attempts if expiry fails */
+		ino->last_used = now;
 		ino->flags &= ~AUTOFS_INF_EXPIRING;
 		complete_all(&ino->expire_complete);
 		spin_unlock(&sbi->fs_lock);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-20  3:13         ` NeilBrown
@ 2014-08-20  3:42           ` Ian Kent
  2014-08-20  3:50             ` Ian Kent
  2014-08-20  9:52             ` Ian Kent
  0 siblings, 2 replies; 17+ messages in thread
From: Ian Kent @ 2014-08-20  3:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Wed, 2014-08-20 at 13:13 +1000, NeilBrown wrote:
> On Tue, 19 Aug 2014 20:36:55 +0800 Ian Kent <raven@themaw.net> wrote:
> 
> > On Tue, 2014-08-19 at 21:16 +1000, NeilBrown wrote:
> > > On Tue, 19 Aug 2014 18:02:27 +0800 Ian Kent <raven@themaw.net> wrote:
> > > 
> > > > On Mon, 2014-08-18 at 16:25 +0800, Ian Kent wrote:
> > > > > On Mon, 2014-08-18 at 16:33 +1000, NeilBrown wrote:
> > > > > > Hi Ian,
> > > > > >  Have you had a chance to run your tests in these patches yet?
> > > > > >  I've done what testing I can think of and cannot fault them.
> > > > > 
> > > > > I haven't, I've been plagued with illness so I'm not getting nearly
> > > > > enough done. I'll try to put a kernel together and run the test in the
> > > > > next week or so.
> > > > 
> > > > Just to let you know that I managed to spend some time on this. I built
> > > > a kernel (3.17.0-rc1) with the series and ran a couple of tests.
> > > > 
> > > > I'm not certain that the patches I used are identical to your posting, I
> > > > saw one difference, after the fact, that shouldn't matter, I'll have to
> > > > check that.
> > > > 
> > > > It isn't possible to test expire to mount races because the mounts in
> > > > the tree never expire.
> > > > 
> > > > At first I thought it was because so many processes were accessing the
> > > > tree all the time but manually constructing the maps and mounting the
> > > > mounts shows that nothing ever expires, at least for this tree.
> > > > 
> > > > However, issuing a shut down does expire all the mounts and shuts down
> > > > autofs cleanly.
> > > > 
> > > > So there is something not quite right with the expire check or my
> > > > patches have mistakes.
> > > 
> > > Ahh.. I bet I know what it is.
> > > autofs4_can_expire() isn't idempotent.
> > > Because we call should_expire twice, autofs4_can_expire() is called twice and
> > > the second time it failed because  the first time it resets ->last_used.
> > > 
> > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > > index eb4b770a4bf6..80133a9d9427 100644
> > > --- a/fs/autofs4/expire.c
> > > +++ b/fs/autofs4/expire.c
> > > @@ -26,6 +26,9 @@ static inline int autofs4_can_expire(struct dentry *dentry,
> > >  	if (ino == NULL)
> > >  		return 0;
> > >  
> > > +	if (ino->flags & AUTOFS_INF_NO_RCU)
> > > +		/* Already performed this test */
> > > +		return 1;
> > 
> > Wouldn't it be better to perform this check, or similar, further down
> > where the last_used time is updated? After all it's only updated to
> > prevent rapid fire expires on dentrys that refuse to umount for some
> > reason.
> > 
> 
> How about the follow approach instead?

Great minds think alike.
I have basically the exact same patch and I'm testing now.
It probably should be folded into "autofs4: avoid taking fs_lock during
rcu-walk".

It takes ages though, the test has two configurations to test and each
one takes about 70 minutes.

fyi, my criteria for the test is if it runs through to completion once
then that's essentially a pass. If not then there is a race that needs
to be fixed before merge. If it fails on runs two or three then there's
a hard to find race that needs attention but is not serious enough to
block merging. I usually stop after three runs but if fails are seen
later then, they too need to be resolved in the fullness of time.

The reason for this is that in heavy use sites we rarely see more than
about three processes concurrently accessing mount points and the test
run uses 10 client instances (currently on a 6 CPU machine) that all
access the mount tree at the same time so that should be somewhat more
pressure than is seen under heavy use.

And indeed this rule of has proven reliable so far.

> Though I must admit that I find the 'now' global variable feels a bit
> clumsy...
> Is there a reason for not just using "jiffies" everywhere?

Yeah, changing to use jiffies probably won't make any difference.
TBH, I can't remember why I did that.

> 
> I've created a tag 'autofs4' in git://neil.brown.name/md/ which has this in
> with all the others, in case that is useful.

I think I have all the patches but I may need to refer to that at some
point.

> 
> Thanks,
> NeilBrown
> 
> 
> 
> From 201f75bc25906e8f64e28b37f1bb478958bf2987 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Wed, 20 Aug 2014 12:40:06 +1000
> Subject: [PATCH] autofs4: make "autofs4_can_expire" idempotent.
> 
> Have a "test" function change the value it is testing can
> be confusing, particularly as a future patch will be calling
> this function twice.
> 
> So move the update for 'last_used' to avoid repeat expiry
> to the place where the final determination on what to expire is known.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> index bee939efca2b..af09dada91bc 100644
> --- a/fs/autofs4/expire.c
> +++ b/fs/autofs4/expire.c
> @@ -30,12 +30,6 @@ static inline int autofs4_can_expire(struct dentry *dentry,
>  		/* Too young to die */
>  		if (!timeout || time_after(ino->last_used + timeout, now))
>  			return 0;
> -
> -		/* update last_used here :-
> -		   - obviously makes sense if it is in use now
> -		   - less obviously, prevents rapid-fire expire
> -		     attempts if expire fails the first time */
> -		ino->last_used = now;
>  	}
>  	return 1;
>  }
> @@ -541,6 +535,8 @@ int autofs4_expire_run(struct super_block *sb,
>  
>  	spin_lock(&sbi->fs_lock);
>  	ino = autofs4_dentry_ino(dentry);
> +	/* avoid rapid-fire expire attempts if expiry fails */
> +	ino->last_used = now;
>  	ino->flags &= ~AUTOFS_INF_EXPIRING;
>  	complete_all(&ino->expire_complete);
>  	spin_unlock(&sbi->fs_lock);
> @@ -567,6 +563,8 @@ int autofs4_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
>  		ret = autofs4_wait(sbi, dentry, NFY_EXPIRE);
>  
>  		spin_lock(&sbi->fs_lock);
> +		/* avoid rapid-fire expire attempts if expiry fails */
> +		ino->last_used = now;
>  		ino->flags &= ~AUTOFS_INF_EXPIRING;
>  		complete_all(&ino->expire_complete);
>  		spin_unlock(&sbi->fs_lock);



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-20  3:42           ` Ian Kent
@ 2014-08-20  3:50             ` Ian Kent
  2014-08-20  9:52             ` Ian Kent
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Kent @ 2014-08-20  3:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Wed, 2014-08-20 at 11:42 +0800, Ian Kent wrote:
> > 
> > From 201f75bc25906e8f64e28b37f1bb478958bf2987 Mon Sep 17 00:00:00 2001
> > From: NeilBrown <neilb@suse.de>
> > Date: Wed, 20 Aug 2014 12:40:06 +1000
> > Subject: [PATCH] autofs4: make "autofs4_can_expire" idempotent.
> > 
> > Have a "test" function change the value it is testing can
> > be confusing, particularly as a future patch will be calling
> > this function twice.
> > 
> > So move the update for 'last_used' to avoid repeat expiry
> > to the place where the final determination on what to expire is known.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > 
> > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > index bee939efca2b..af09dada91bc 100644
> > --- a/fs/autofs4/expire.c
> > +++ b/fs/autofs4/expire.c
> > @@ -30,12 +30,6 @@ static inline int autofs4_can_expire(struct dentry *dentry,
> >  		/* Too young to die */
> >  		if (!timeout || time_after(ino->last_used + timeout, now))
> >  			return 0;
> > -
> > -		/* update last_used here :-
> > -		   - obviously makes sense if it is in use now
> > -		   - less obviously, prevents rapid-fire expire
> > -		     attempts if expire fails the first time */
> > -		ino->last_used = now;
> >  	}
> >  	return 1;
> >  }
> > @@ -541,6 +535,8 @@ int autofs4_expire_run(struct super_block *sb,
> >  
> >  	spin_lock(&sbi->fs_lock);
> >  	ino = autofs4_dentry_ino(dentry);
> > +	/* avoid rapid-fire expire attempts if expiry fails */
> > +	ino->last_used = now;
> >  	ino->flags &= ~AUTOFS_INF_EXPIRING;
> >  	complete_all(&ino->expire_complete);
> >  	spin_unlock(&sbi->fs_lock);
> > @@ -567,6 +563,8 @@ int autofs4_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
> >  		ret = autofs4_wait(sbi, dentry, NFY_EXPIRE);
> >  
> >  		spin_lock(&sbi->fs_lock);
> > +		/* avoid rapid-fire expire attempts if expiry fails */
> > +		ino->last_used = now;
> >  		ino->flags &= ~AUTOFS_INF_EXPIRING;
> >  		complete_all(&ino->expire_complete);
> >  		spin_unlock(&sbi->fs_lock);
> 

On looking a little closer I've set last_used at slightly different
locations. It is still does the same thing though.

I'm not fussy where we actually do this, so your call.

autofs - fix last_used usage

From: Ian Kent <ikent@redhat.com>

Signed-off-by: Ian Kent <ikent@redhat.com>
---
 fs/autofs4/expire.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index cca5d1e..d97a761 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -30,12 +30,6 @@ static inline int autofs4_can_expire(struct dentry *dentry,
 		/* Too young to die */
 		if (!timeout || time_after(ino->last_used + timeout, now))
 			return 0;
-
-		/* update last_used here :-
-		   - obviously makes sense if it is in use now
-		   - less obviously, prevents rapid-fire expire
-		     attempts if expire fails the first time */
-		ino->last_used = now;
 	}
 	return 1;
 }
@@ -341,6 +335,8 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
 			ino->flags |= AUTOFS_INF_EXPIRING;
 			smp_mb();
 			ino->flags &= ~AUTOFS_INF_NO_RCU;
+			/* Prevent rapid fire expire events */
+			ino->last_used = now;
 			init_completion(&ino->expire_complete);
 			spin_unlock(&sbi->fs_lock);
 			return root;
@@ -496,6 +492,8 @@ found:
 	ino->flags |= AUTOFS_INF_EXPIRING;
 	smp_mb();
 	ino->flags &= ~AUTOFS_INF_NO_RCU;
+	/* Prevent rapid fire expire events */
+	ino->last_used = now;
 	init_completion(&ino->expire_complete);
 	spin_unlock(&sbi->fs_lock);
 	spin_lock(&sbi->lookup_lock);





^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] RCU-walk support for autofs
  2014-08-20  3:42           ` Ian Kent
  2014-08-20  3:50             ` Ian Kent
@ 2014-08-20  9:52             ` Ian Kent
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Kent @ 2014-08-20  9:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: autofs, linux-kernel

On Wed, 2014-08-20 at 11:42 +0800, Ian Kent wrote:
> Great minds think alike.
> I have basically the exact same patch and I'm testing now.
> It probably should be folded into "autofs4: avoid taking fs_lock during
> rcu-walk".
> 
> It takes ages though, the test has two configurations to test and each
> one takes about 70 minutes.
> 
> fyi, my criteria for the test is if it runs through to completion once
> then that's essentially a pass. If not then there is a race that needs
> to be fixed before merge. If it fails on runs two or three then there's
> a hard to find race that needs attention but is not serious enough to
> block merging. I usually stop after three runs but if fails are seen
> later then, they too need to be resolved in the fullness of time.
> 
> The reason for this is that in heavy use sites we rarely see more than
> about three processes concurrently accessing mount points and the test
> run uses 10 client instances (currently on a 6 CPU machine) that all
> access the mount tree at the same time so that should be somewhat more
> pressure than is seen under heavy use.
> 
> And indeed this rule of has proven reliable so far.

I've run my test three times now.

The first time I had debug logging enabled. I disabled it for the second
and third runs as that sometimes makes a difference.

All three runs went through fine.

The only thing I noticed was that the test took a little longer (two and
a half to three minutes longer, of 70) to complete than without the
patch series but I've seen that sort of variation in kernels at
different times in the past.

So I think the series is fine and should be merged.
Please feel free to add any of my acked, reviewed and tested by to them.

Ian



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-08-20  9:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-18  6:33 [PATCH 0/5] RCU-walk support for autofs NeilBrown
2014-08-18  6:33 ` [PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4 NeilBrown
2014-08-18  6:33 ` [PATCH 5/5] autofs: the documentation I wanted to read NeilBrown
2014-08-18 10:13   ` Ian Kent
2014-08-19  7:00     ` NeilBrown
2014-08-18  6:33 ` [PATCH 3/5] autofs4: avoid taking fs_lock during rcu-walk NeilBrown
2014-08-18  6:33 ` [PATCH 4/5] autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode NeilBrown
2014-08-18  6:33 ` [PATCH 2/5] autofs4: factor should_expire() out of autofs4_expire_indirect NeilBrown
2014-08-18  8:25 ` [PATCH 0/5] RCU-walk support for autofs Ian Kent
2014-08-19 10:02   ` Ian Kent
2014-08-19 11:16     ` NeilBrown
2014-08-19 12:30       ` Ian Kent
2014-08-19 12:36       ` Ian Kent
2014-08-20  3:13         ` NeilBrown
2014-08-20  3:42           ` Ian Kent
2014-08-20  3:50             ` Ian Kent
2014-08-20  9:52             ` Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox