All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Kent <raven@themaw.net>
To: Jim Carter <jimc@math.ucla.edu>
Cc: autofs@linux.kernel.org
Subject: Re: clients suddenly start hanging (was: (no subject))
Date: Thu, 08 May 2008 14:13:28 +0800	[thread overview]
Message-ID: <1210227208.1392.51.camel@raven.themaw.net> (raw)
In-Reply-To: <20080508045235.13C9EF8ED9@serval.math.ucla.edu>


On Wed, 2008-05-07 at 21:52 -0700, Jim Carter wrote:
> On Mon, 28 Apr 2008 14:26:34 +0800 Ian Kent wrote:
> 
> > Jeff Moyer has identified a race in due to an execution order dependency
> > in the autofs4 function root.c:try_to_fill_dentry().
> --snip--
> > After the daemon finishes the mount, it calls back into the kernel
> > to release the waiters. When this happens, P1 is woken up and goes
> > about clearing the DCACHE_AUTOFS_PENDING flag, but it does this in
> > D1!  So, given that P1 in our case is a program that will immediately
> > try to access a file under /mount/submount/foo, we end up finding the
> > dentry D2 which still has the pending flag set, and we set out to
> > wait for a mount *again*!
> 
> I applied the two patches (redo-lookup-in-ttfd and correct-return-in-ttfd)
> and restarted/reloaded the resulting module, but unfortunately it did 
> not improve the issue of hanging client processes in my submount case.

You've jumped the gun at bit. These patches are only part of the work to
resolve these issues. I posted them, since I felt they were sound and
addressed issues that needed to be addressed, in order to meet the merge
window for 2.6.26. There are going to be more kernel changes.

But the patch that may make a real difference for your case is for the
daemon and hasn't been merged yet because were still testing it. I would
have posted it for you to test but I though we were still undecided as
to whether you wanted me to attempt a back port to an older SuSE code
base or you were happy to use the latest source.

Anyway, here it is, against the latest source, for you to test.

---

 daemon/automount.c |   91 +++++++++++++++++++++++++++++-----------------------
 daemon/indirect.c  |   89 +++++++++++++++++++++++++++++++++++++++------------
 daemon/lookup.c    |    3 --
 daemon/state.c     |    5 ++-
 lib/master.c       |    1 +
 5 files changed, 123 insertions(+), 66 deletions(-)


diff --git a/daemon/automount.c b/daemon/automount.c
index 7ce9828..c8b7d1e 100644
--- a/daemon/automount.c
+++ b/daemon/automount.c
@@ -1465,13 +1465,11 @@ static void handle_mounts_cleanup(void *arg)
 	/* If we have been canceled then we may hold the state mutex. */
 	mutex_operation_wait(&ap->state_mutex);
 
-	alarm_delete(ap);
-	st_remove_tasks(ap);
-
-	umount_autofs(ap, 1);
-
 	destroy_logpri_fifo(ap);
-	master_signal_submount(ap, MASTER_SUBMNT_JOIN);
+	if (submount) {
+		master_signal_submount(ap, MASTER_SUBMNT_JOIN);
+		master_source_unlock(ap->parent->entry);
+	}
 	master_remove_mapent(ap->entry);
 	master_free_mapent_sources(ap->entry, 1);
 	master_free_mapent(ap->entry);
@@ -1533,71 +1531,82 @@ void *handle_mounts(void *arg)
 	if (!ap->submount && ap->exp_timeout)
 		alarm_add(ap, ap->exp_runfreq + rand() % ap->exp_runfreq);
 
-	pthread_cleanup_push(handle_mounts_cleanup, ap);
-	pthread_setcancelstate(cancel_state, NULL);
-
 	state_mutex_unlock(ap);
+	pthread_setcancelstate(cancel_state, NULL);
 
 	while (ap->state != ST_SHUTDOWN) {
 		if (handle_packet(ap)) {
-			int ret, result;
+			int ret, cur_state;
+
+			/*
+			 * If we're a submount we need to ensure our parent
+			 * doesn't try to mount us again until our shutdown
+			 * is complete and that any outstanding mounts are
+			 * completed before we try to shutdown.
+			 */
+			pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cur_state);
+			if (ap->submount)
+				master_source_writelock(ap->parent->entry);
 
 			state_mutex_lock(ap);
+
+			if (ap->state != ST_SHUTDOWN) {
+				if (!ap->submount)
+					alarm_add(ap, ap->exp_runfreq);
+				state_mutex_unlock(ap);
+				/* Return to ST_READY is done immediately */
+				st_add_task(ap, ST_READY);
+				if (ap->submount)
+					master_source_unlock(ap->parent->entry);
+				pthread_setcancelstate(cur_state, NULL);
+				continue;
+			}
+
+			alarm_delete(ap);
+			st_remove_tasks(ap);
+
 			/*
 			 * For a direct mount map all mounts have already gone
-			 * by the time we get here.
+			 * by the time we get here and since we only ever
+			 * umount direct mounts at shutdown there is no need
+			 * to check for possible recovery.
 			 */
 			if (ap->type == LKP_DIRECT) {
-				status = 1;
+				umount_autofs(ap, 1);
 				state_mutex_unlock(ap);
 				break;
 			}
 
 			/*
-			 * If the ioctl fails assume the kernel doesn't have
-			 * AUTOFS_IOC_ASKUMOUNT and just continue.
+			 * If umount_autofs returns non-zero it wasn't able
+			 * to complete the umount and has left the mount intact
+			 * so we can continue. This can happen if a lookup
+			 * occurs while we're trying to umount.
 			 */
-			ret = ioctl(ap->ioctlfd, AUTOFS_IOC_ASKUMOUNT, &result);
-			if (ret == -1) {
+			ret = umount_autofs(ap, 1);
+			if (!ret) {
 				state_mutex_unlock(ap);
 				break;
 			}
 
-			/* OK to exit */
-			if (ap->state == ST_SHUTDOWN) {
-				if (result) {
-					state_mutex_unlock(ap);
-					break;
-				}
-#ifdef ENABLE_IGNORE_BUSY_MOUNTS
-				/*
-				 * There weren't any active mounts but if the
-				 * filesystem is busy there may be a mount
-				 * request in progress so return to the ready
-				 * state unless a shutdown has been explicitly
-				 * requested.
-				 */
-				if (ap->shutdown) {
-					state_mutex_unlock(ap);
-					break;
-				}
-#endif
-			}
-
 			/* Failed shutdown returns to ready */
 			warn(ap->logopt,
 			     "can't shutdown: filesystem %s still busy",
 			     ap->path);
 			if (!ap->submount)
 				alarm_add(ap, ap->exp_runfreq);
-			nextstate(ap->state_pipe[1], ST_READY);
-
 			state_mutex_unlock(ap);
+			/* Return to ST_READY is done immediately */
+			st_add_task(ap, ST_READY);
+
+			if (ap->submount)
+				master_source_unlock(ap->parent->entry);
+			pthread_setcancelstate(cur_state, NULL);
+
 		}
 	}
 
-	pthread_cleanup_pop(1);
-	sched_yield();
+	handle_mounts_cleanup(ap);
 
 	return NULL;
 }
diff --git a/daemon/indirect.c b/daemon/indirect.c
index 11865b3..c1bd3f2 100644
--- a/daemon/indirect.c
+++ b/daemon/indirect.c
@@ -233,11 +233,8 @@ int mount_autofs_indirect(struct autofs_point *ap)
 	return 0;
 }
 
-int umount_autofs_indirect(struct autofs_point *ap)
+static void close_mount_fds(struct autofs_point *ap)
 {
-	char buf[MAX_ERR_BUF];
-	int ret, rv, retries;
-
 	/*
 	 * Since submounts look after themselves the parent never knows
 	 * it needs to close the ioctlfd for offset mounts so we have
@@ -247,6 +244,25 @@ int umount_autofs_indirect(struct autofs_point *ap)
 	if (ap->submount)
 		lookup_source_close_ioctlfd(ap->parent, ap->path);
 
+	close(ap->state_pipe[0]);
+	close(ap->state_pipe[1]);
+	ap->state_pipe[0] = -1;
+	ap->state_pipe[1] = -1;
+
+	if (ap->pipefd >= 0)
+		close(ap->pipefd);
+
+	if (ap->kpipefd >= 0)
+		close(ap->kpipefd);
+
+	return;
+}
+
+int umount_autofs_indirect(struct autofs_point *ap)
+{
+	char buf[MAX_ERR_BUF];
+	int ret, rv, retries;
+
 	/* If we are trying to shutdown make sure we can umount */
 	rv = ioctl(ap->ioctlfd, AUTOFS_IOC_ASKUMOUNT, &ret);
 	if (rv == -1) {
@@ -255,23 +271,19 @@ int umount_autofs_indirect(struct autofs_point *ap)
 		return 1;
 	} else if (!ret) {
 		error(ap->logopt, "ask umount returned busy %s", ap->path);
+#if defined(ENABLE_IGNORE_BUSY_MOUNTS) || defined(ENABLE_FORCED_SHUTDOWN)
+		if (!ap->shutdown)
+			return 1;
+#else
 		return 1;
+#endif
 	}
 
-	ioctl(ap->ioctlfd, AUTOFS_IOC_CATATONIC, 0);
+	if (ap->shutdown)
+		ioctl(ap->ioctlfd, AUTOFS_IOC_CATATONIC, 0);
+
 	close(ap->ioctlfd);
 	ap->ioctlfd = -1;
-	close(ap->state_pipe[0]);
-	close(ap->state_pipe[1]);
-	ap->state_pipe[0] = -1;
-	ap->state_pipe[1] = -1;
-
-	if (ap->pipefd >= 0)
-		close(ap->pipefd);
-
-	if (ap->kpipefd >= 0)
-		close(ap->kpipefd);
-
 	sched_yield();
 
 	retries = UMOUNT_RETRIES;
@@ -288,24 +300,61 @@ int umount_autofs_indirect(struct autofs_point *ap)
 		case EINVAL:
 			error(ap->logopt,
 			      "mount point %s does not exist", ap->path);
+			close_mount_fds(ap);
 			return 0;
 			break;
 		case EBUSY:
-			error(ap->logopt,
+			debug(ap->logopt,
 			      "mount point %s is in use", ap->path);
-			if (ap->state == ST_SHUTDOWN_FORCE)
+			if (ap->state == ST_SHUTDOWN_FORCE) {
+				close_mount_fds(ap);
 				goto force_umount;
-			else
-				return 0;
+			} else {
+				int cl_flags;
+				/*
+				 * If the umount returns EBUSY there may be
+				 * a mount request in progress so we need to
+				 * recover unless we have been explicitly
+				 * asked to shutdown and configure option
+				 * ENABLE_IGNORE_BUSY_MOUNTS is enabled.
+				 */
+#ifdef ENABLE_IGNORE_BUSY_MOUNTS
+				if (ap->shutdown) {
+					close_mount_fds(ap);
+					return 0;
+				}
+#endif
+				ap->ioctlfd = open(ap->path, O_RDONLY);
+				if (ap->ioctlfd < 0) {
+					warn(ap->logopt,
+					     "could not recover autofs path %s",
+					     ap->path);
+					close_mount_fds(ap);
+					return 0;
+				}
+
+				if ((cl_flags = fcntl(ap->ioctlfd, F_GETFD, 0)) != -1) {
+					cl_flags |= FD_CLOEXEC;
+					fcntl(ap->ioctlfd, F_SETFD, cl_flags);
+				}
+			}
 			break;
 		case ENOTDIR:
 			error(ap->logopt, "mount point is not a directory");
+			close_mount_fds(ap);
 			return 0;
 			break;
 		}
 		return 1;
 	}
 
+	/*
+	 * We have successfully umounted the mount so we now close
+	 * the descriptors. The kernel end of the kernel pipe will
+	 * have been put during the umount super block cleanup.
+	 */
+	close_mount_fds(ap);
+
 force_umount:
 	if (rv != 0) {
 		warn(ap->logopt,
diff --git a/daemon/lookup.c b/daemon/lookup.c
index eac6053..af70fde 100644
--- a/daemon/lookup.c
+++ b/daemon/lookup.c
@@ -1139,8 +1139,6 @@ int lookup_source_close_ioctlfd(struct autofs_point *ap, const char *key)
 	struct mapent *me;
 	int ret = 0;
 
-	pthread_cleanup_push(master_source_lock_cleanup, entry);
-	master_source_readlock(entry);
 	map = entry->maps;
 	while (map) {
 		mc = map->mc;
@@ -1158,7 +1156,6 @@ int lookup_source_close_ioctlfd(struct autofs_point *ap, const char *key)
 		cache_unlock(mc);
 		map = map->next;
 	}
-	pthread_cleanup_pop(1);
 
 	return ret;
 }
diff --git a/daemon/state.c b/daemon/state.c
index 5804707..8a1c29e 100644
--- a/daemon/state.c
+++ b/daemon/state.c
@@ -213,8 +213,10 @@ static unsigned int st_ready(struct autofs_point *ap)
 	debug(ap->logopt,
 	      "st_ready(): state = %d path %s", ap->state, ap->path);
 
+	state_mutex_lock(ap);
 	ap->shutdown = 0;
 	ap->state = ST_READY;
+	state_mutex_unlock(ap);
 
 	if (ap->submount)
 		master_signal_submount(ap, MASTER_SUBMNT_CONTINUE);
@@ -677,9 +679,8 @@ int st_add_task(struct autofs_point *ap, enum states state)
 
 	/* Task termination marker, poke state machine */
 	if (state == ST_READY) {
-		state_mutex_lock(ap);
+		/* NOTE: no state mutex lock here */
 		st_ready(ap);
-		state_mutex_unlock(ap);
 
 		st_mutex_lock();
 
diff --git a/lib/master.c b/lib/master.c
index 4a34dd4..ac9b381 100644
--- a/lib/master.c
+++ b/lib/master.c
@@ -970,6 +970,7 @@ void master_notify_state_change(struct master *master, int sig)
 			if (ap->state != ST_SHUTDOWN_FORCE &&
 			    ap->state != ST_SHUTDOWN_PENDING) {
 				next = ST_SHUTDOWN_FORCE;
+				ap->shutdown = 1;
 				nextstate(state_pipe, next);
 			}
 			break;

  reply	other threads:[~2008-05-08  6:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-23 18:50 (no subject) Jim Carter
2008-04-23 20:04 ` Jeff Moyer
2008-04-24  3:10   ` Ian Kent
2008-04-24 16:52   ` clients suddenly start hanging (was: (no subject)) Jim Carter
2008-04-26  1:17   ` Jim Carter
2008-04-26  5:34     ` Ian Kent
2008-04-26 18:48       ` Jim Carter
2008-04-27  5:52         ` Ian Kent
2008-04-26 22:16       ` Jim Carter
2008-04-28  6:26 ` [PATCH 1/2] autofs4 - fix execution order race in mount request code Ian Kent
2008-05-08  4:52   ` clients suddenly start hanging (was: (no subject)) Jim Carter
2008-05-08  6:13     ` Ian Kent [this message]
2008-05-11  4:14       ` Jim Carter
2008-05-11  7:57         ` Ian Kent
2008-05-15 21:59           ` Jim Carter
2008-05-16  3:00             ` Ian Kent
2008-05-18  4:07             ` Ian Kent
2008-05-21  6:58               ` Ian Kent
2008-05-22 21:42               ` Jim Carter
2008-05-23  2:35                 ` Ian Kent
2008-05-26  0:34                   ` Jim Carter
2008-06-12  3:20                     ` Ian Kent
  -- strict thread matches above, loose matches on Subject: below --
2008-06-12  4:50 [PATCH 00/10] Kernel patch series Ian Kent
2008-06-12  4:50 ` [PATCH 01/10] autofs4 - check for invalid dentry in getpath Ian Kent
2008-06-12  4:50 ` [PATCH 02/10] autofs4 - fix sparse warning in waitq.c:autofs4_expire_indirect() Ian Kent
2008-06-12  4:50 ` [PATCH 03/10] autofs4 - fix incorrect return from root.c:try_to_fill_dentry() Ian Kent
2008-06-12  4:51 ` [PATCH 04/10] autofs4 - fix mntput, dput order bug Ian Kent
2008-06-12  4:51 ` [PATCH 05/10] autofs4 - don't make expiring dentry negative Ian Kent
2008-06-12  4:51 ` [PATCH 06/10] autofs4 - use look aside list for lookups Ian Kent
2008-06-12  4:51 ` [PATCH 07/10] autofs4 - don't release directory mutex if called in oz_mode Ian Kent
2008-06-12  4:51 ` [PATCH 08/10] autofs4 - use lookup intent flags to trigger mounts Ian Kent
2008-06-12  4:51 ` [PATCH 09/10] autofs4 - use struct qstr in waitq.c Ian Kent
2008-06-12  4:51 ` [PATCH 10/10] autofs4 - fix pending mount race Ian Kent
2008-06-14  1:13 ` [PATCH 00/10] Kernel patch series Jim Carter
2008-06-14  3:30   ` Ian Kent
2008-06-14  3:42     ` Ian Kent
2008-06-19  0:40       ` clients suddenly start hanging (was: (no subject)) Jim Carter
2008-06-19  3:14         ` Ian Kent
2008-06-19 17:08           ` Jim Carter
2008-06-19 18:34           ` Jim Carter
2008-06-20  4:09             ` Ian Kent
2008-06-21  1:02               ` Jim Carter
2008-06-21  3:12                 ` Ian Kent
2008-06-23  3:49                   ` Jim Carter
2008-06-23  4:46                     ` Ian Kent
2008-06-24  3:08                       ` Ian Kent
2008-06-24 17:02                         ` Stephen Biggs
2008-06-24 23:39                         ` Jim Carter
2008-06-25  3:33                           ` Ian Kent
2008-06-25  5:00                             ` Ian Kent
2008-06-23  4:15                   ` Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1210227208.1392.51.camel@raven.themaw.net \
    --to=raven@themaw.net \
    --cc=autofs@linux.kernel.org \
    --cc=jimc@math.ucla.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.