Linux NFS development
 help / color / mirror / Atom feed
* [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh
@ 2023-10-23  1:58 NeilBrown
  2023-10-23  1:58 ` [PATCH 1/6] export: fix handling of error from match_fsid() NeilBrown
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

Hi,
 this is a revised version of my previous series with the same name.
 This first two patches are unchanged.
 The third patch, which was an RFC, has been replaced with the last
 patch which actually addresses the issue rather than skirting 
 around it.

 Patch 3 here is a revert of a change I noticed while exploring the
 code.  cache_open() must be called BEFORE forking workers, as explained
 in that patch.
 Patches 4 and 5 factor our common code which makes the final patch
 simpler.

 The core issue is that sometimes mountd (or exportd) cannot give a
 definitey "yes" or "no" to a request to map an fsid to a path name.
 In these cases the only safe option is to delay and try again.

 This only becomes relevant if a filesystem is mounted by a client, then
 the server restarts (or the export cache is flushed) and the client
 tries to use a filehandle that it already has, but that server cannot
 find it and cannot be sure it doesn't exist.  This can happen when an
 export is marked "mountpoint" or when a re-exported NFS filesystem
 cannot contact the server and reports an ETIMEDOUT error.  In these
 cases we want the client to continue waiting (which it does) and also
 want mountd/exportd to periodically check if the target filesystem has
 come back (which it currently does not).  
 With the current code, once this situation happens and the client is
 waiting, the client will continue to wait indefintely even if the
 target filesytem becomes available.  The client can only continue if
 the NFS server is restarted or the export cache is flushed.  After the
 ptsch, then within 2 minutes of the target filesystem becoming
 available again, mountd will tell the kernel and when the client asks
 again it will get be allowed to proceed.

NeilBrown


 [PATCH 1/6] export: fix handling of error from match_fsid()
 [PATCH 2/6] export: add EACCES to the list of known
 [PATCH 3/6] export: move cache_open() before workers are forked.
 [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
 [PATCH 5/6] Share process_loop code between mountd and exportd.
 [PATCH 6/6] cache: periodically retry requests that couldn't be


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] export: fix handling of error from match_fsid()
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
@ 2023-10-23  1:58 ` NeilBrown
  2023-10-23  1:58 ` [PATCH 2/6] export: add EACCES to the list of known path_lookup_error() errors NeilBrown
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

If match_fsid() returns -1 we shouldn't assume that the path definitely
doesn't match the fsid, though it might not.
This is a similar situation to where an export is expected to be a mount
point, but is found not to be one.  So it can be handled the same way,
by setting 'dev_missing'.
This will only have an effect if no other path matched the fsid, which
is what we want.

The current code results in nothing being exported if any export point,
or any mount point beneath a crossmnt export point, fails a 'stat'
request, which is too harsh.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 support/export/cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 19bbba556060..e4595020f43f 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -858,7 +858,8 @@ static void nfsd_fh(int f)
 			case 0:
 				continue;
 			case -1:
-				goto out;
+				dev_missing ++;
+				continue;
 			}
 			if (is_ipaddr_client(dom)
 					&& !ipaddr_client_matches(exp, ai))
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/6] export: add EACCES to the list of known path_lookup_error() errors.
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
  2023-10-23  1:58 ` [PATCH 1/6] export: fix handling of error from match_fsid() NeilBrown
@ 2023-10-23  1:58 ` NeilBrown
  2023-10-23  1:58 ` [PATCH 3/6] export: move cache_open() before workers are forked NeilBrown
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

If a 'stat' results in EACCES (for root), then it is likely a permanent
problem.  One possible cause is a 'fuser' filesystem which only gives
any access to the user which mounted it.

So it is reasonable for EACCES to be a "path lookup error"

Signed-off-by: NeilBrown <neilb@suse.de>
---
 support/export/cache.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/support/export/cache.c b/support/export/cache.c
index e4595020f43f..5307f6c8d872 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -77,6 +77,7 @@ static bool path_lookup_error(int err)
 	case ENAMETOOLONG:
 	case ENOENT:
 	case ENOTDIR:
+	case EACCES:
 		return 1;
 	}
 	return 0;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/6] export: move cache_open() before workers are forked.
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
  2023-10-23  1:58 ` [PATCH 1/6] export: fix handling of error from match_fsid() NeilBrown
  2023-10-23  1:58 ` [PATCH 2/6] export: add EACCES to the list of known path_lookup_error() errors NeilBrown
@ 2023-10-23  1:58 ` NeilBrown
  2023-10-23  1:58 ` [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c NeilBrown
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

If each worker has a separate open on a cache channel, then each worker
will potentially receive every upcall request resulting in duplicated
work.

A worker will only not see a request that another worker sees if that
other worker answers the request before this worker gets a chance to
read it.

To avoid duplicate effort between threads and so get maximum benefit
from multiple threads, open the cache channels before forking.

Note that the kernel provides locking so that only one thread can be
reading to writing to any channel at any given moment.

Fixes: 5fc3bac9e0c3 ("mountd: Ensure we don't share cache file descriptors among processes.")
Signed-off-by: NeilBrown <neilb@suse.de>
---
 utils/exportd/exportd.c | 8 ++++++--
 utils/mountd/mountd.c   | 8 ++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/utils/exportd/exportd.c b/utils/exportd/exportd.c
index 2dd12cb6015b..6f866445efc2 100644
--- a/utils/exportd/exportd.c
+++ b/utils/exportd/exportd.c
@@ -289,12 +289,16 @@ main(int argc, char **argv)
 	else if (num_threads > MAX_THREADS)
 		num_threads = MAX_THREADS;
 
+	/* Open cache channel files BEFORE forking so each upcall is
+	 * only handled by one thread.  Kernel provides locking for both
+	 * read and write.
+	 */
+	cache_open();
+
 	if (num_threads > 1)
 		fork_workers();
 
 
-	/* Open files now to avoid sharing descriptors among forked processes */
-	cache_open();
 	v4clients_init();
 
 	/* Process incoming upcalls */
diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index bcf749fabbb3..f9c62cded66c 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -916,12 +916,16 @@ main(int argc, char **argv)
 	else if (num_threads > MAX_THREADS)
 		num_threads = MAX_THREADS;
 
+	/* Open cache channel files BEFORE forking so each upcall is
+	 * only handled by one thread.  Kernel provides locking for both
+	 * read and write.
+	 */
+	cache_open();
+
 	if (num_threads > 1)
 		fork_workers();
 
 	nfsd_path_init();
-	/* Open files now to avoid sharing descriptors among forked processes */
-	cache_open();
 	v4clients_init();
 
 	xlog(L_NOTICE, "Version " VERSION " starting");
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
                   ` (2 preceding siblings ...)
  2023-10-23  1:58 ` [PATCH 3/6] export: move cache_open() before workers are forked NeilBrown
@ 2023-10-23  1:58 ` NeilBrown
  2023-10-23  1:58 ` [PATCH 5/6] Share process_loop code between mountd and exportd NeilBrown
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

Both mountd and exported have fork_workers() and wait_for_workers()
which are nearly identical.
Move this code into cache.c (adding a cache_ prefix to the function
names) and leave the minor differences in the two callers.

Also remove duplicate declarations from mountd.h.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 support/export/cache.c  | 75 ++++++++++++++++++++++++++++++++-
 support/export/export.h |  2 +
 utils/exportd/exportd.c | 90 ++++++----------------------------------
 utils/mountd/mountd.c   | 91 ++++++-----------------------------------
 utils/mountd/mountd.h   |  9 ----
 5 files changed, 99 insertions(+), 168 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 5307f6c8d872..1874156af5e5 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -1,10 +1,9 @@
-
 /*
  * Handle communication with knfsd internal cache
  *
  * We open /proc/net/rpc/{auth.unix.ip,nfsd.export,nfsd.fh}/channel
  * and listen for requests (using my_svc_run)
- * 
+ *
  */
 
 #ifdef HAVE_CONFIG_H
@@ -16,6 +15,7 @@
 #include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/vfs.h>
+#include <sys/wait.h>
 #include <time.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
@@ -1775,3 +1775,74 @@ cache_get_filehandle(nfs_export *exp, int len, char *p)
 	fh.fh_size = qword_get(&bp, (char *)fh.fh_handle, NFS3_FHSIZE);
 	return &fh;
 }
+
+/* Wait for all worker child processes to exit and reap them */
+void
+cache_wait_for_workers(char *prog)
+{
+	int status;
+	pid_t pid;
+
+	for (;;) {
+
+		pid = waitpid(0, &status, 0);
+
+		if (pid < 0) {
+			if (errno == ECHILD)
+				return; /* no more children */
+			xlog(L_FATAL, "%s: can't wait: %s\n", prog,
+					strerror(errno));
+		}
+
+		/* Note: because we SIG_IGN'd SIGCHLD earlier, this
+		 * does not happen on 2.6 kernels, and waitpid() blocks
+		 * until all the children are dead then returns with
+		 * -ECHILD.  But, we don't need to do anything on the
+		 * death of individual workers, so we don't care. */
+		xlog(L_NOTICE, "%s: reaped child %d, status %d\n",
+		     prog, (int)pid, status);
+	}
+}
+
+/* Fork num_threads worker children and wait for them */
+int
+cache_fork_workers(char *prog, int num_threads)
+{
+	int i;
+	pid_t pid;
+
+	if (num_threads <= 1)
+		return 1;
+
+	xlog(L_NOTICE, "%s: starting %d threads\n", prog, num_threads);
+
+	for (i = 0 ; i < num_threads ; i++) {
+		pid = fork();
+		if (pid < 0) {
+			xlog(L_FATAL, "%s: cannot fork: %s\n", prog,
+					strerror(errno));
+		}
+		if (pid == 0) {
+			/* worker child */
+
+			/* Re-enable the default action on SIGTERM et al
+			 * so that workers die naturally when sent them.
+			 * Only the parent unregisters with pmap and
+			 * hence needs to do special SIGTERM handling. */
+			struct sigaction sa;
+			sa.sa_handler = SIG_DFL;
+			sa.sa_flags = 0;
+			sigemptyset(&sa.sa_mask);
+			sigaction(SIGHUP, &sa, NULL);
+			sigaction(SIGINT, &sa, NULL);
+			sigaction(SIGTERM, &sa, NULL);
+
+			/* fall into my_svc_run in caller */
+			return 1;
+		}
+	}
+
+	/* in parent */
+	cache_wait_for_workers(prog);
+	return 0;
+}
diff --git a/support/export/export.h b/support/export/export.h
index 8d5a0d3004ef..ce561f9fbd3e 100644
--- a/support/export/export.h
+++ b/support/export/export.h
@@ -29,6 +29,8 @@ int		v4clients_process(fd_set *fdset);
 struct nfs_fh_len *
 		cache_get_filehandle(nfs_export *exp, int len, char *p);
 int		cache_export(nfs_export *exp, char *path);
+int		cache_fork_workers(char *prog, int num_threads);
+void		cache_wait_for_workers(char *prog);
 
 bool ipaddr_client_matches(nfs_export *exp, struct addrinfo *ai);
 bool namelist_client_matches(nfs_export *exp, char *dom);
diff --git a/utils/exportd/exportd.c b/utils/exportd/exportd.c
index 6f866445efc2..d07a885c6763 100644
--- a/utils/exportd/exportd.c
+++ b/utils/exportd/exportd.c
@@ -16,7 +16,6 @@
 #include <string.h>
 #include <getopt.h>
 #include <errno.h>
-#include <wait.h>
 
 #include "nfslib.h"
 #include "conffile.h"
@@ -54,90 +53,19 @@ static char shortopts[] = "d:fghs:t:liT:";
  */
 inline static void set_signals(void);
 
-/* Wait for all worker child processes to exit and reap them */
-static void
-wait_for_workers (void)
-{
-	int status;
-	pid_t pid;
-
-	for (;;) {
-
-		pid = waitpid(0, &status, 0);
-
-		if (pid < 0) {
-			if (errno == ECHILD)
-				return; /* no more children */
-			xlog(L_FATAL, "mountd: can't wait: %s\n",
-					strerror(errno));
-		}
-
-		/* Note: because we SIG_IGN'd SIGCHLD earlier, this
-		 * does not happen on 2.6 kernels, and waitpid() blocks
-		 * until all the children are dead then returns with
-		 * -ECHILD.  But, we don't need to do anything on the
-		 * death of individual workers, so we don't care. */
-		xlog(L_NOTICE, "mountd: reaped child %d, status %d\n",
-				(int)pid, status);
-	}
-}
-
 inline void
 cleanup_lockfiles (void)
 {
 	unlink(etab.lockfn);
 }
 
-/* Fork num_threads worker children and wait for them */
 static void
-fork_workers(void)
-{
-	int i;
-	pid_t pid;
-
-	xlog(L_NOTICE, "mountd: starting %d threads\n", num_threads);
-
-	for (i = 0 ; i < num_threads ; i++) {
-		pid = fork();
-		if (pid < 0) {
-			xlog(L_FATAL, "mountd: cannot fork: %s\n",
-					strerror(errno));
-		}
-		if (pid == 0) {
-			/* worker child */
-
-			/* Re-enable the default action on SIGTERM et al
-			 * so that workers die naturally when sent them.
-			 * Only the parent unregisters with pmap and
-			 * hence needs to do special SIGTERM handling. */
-			struct sigaction sa;
-			sa.sa_handler = SIG_DFL;
-			sa.sa_flags = 0;
-			sigemptyset(&sa.sa_mask);
-			sigaction(SIGHUP, &sa, NULL);
-			sigaction(SIGINT, &sa, NULL);
-			sigaction(SIGTERM, &sa, NULL);
-
-			/* fall into my_svc_run in caller */
-			return;
-		}
-	}
-
-	/* in parent */
-	wait_for_workers();
-	cleanup_lockfiles();
-	free_state_path_names(&etab);
-	xlog(L_NOTICE, "exportd: no more workers, exiting\n");
-	exit(0);
-}
-
-static void 
 killer (int sig)
 {
 	if (num_threads > 1) {
 		/* play Kronos and eat our children */
 		kill(0, SIGTERM);
-		wait_for_workers();
+		cache_wait_for_workers("exportd");
 	}
 	cleanup_lockfiles();
 	free_state_path_names(&etab);
@@ -145,6 +73,7 @@ killer (int sig)
 
 	exit(0);
 }
+
 static void
 sig_hup (int UNUSED(sig))
 {
@@ -152,8 +81,9 @@ sig_hup (int UNUSED(sig))
 	xlog (L_NOTICE, "Received SIGHUP... Ignoring.\n");
 	return;
 }
-inline static void 
-set_signals(void) 
+
+inline static void
+set_signals(void)
 {
 	struct sigaction sa;
 
@@ -295,9 +225,13 @@ main(int argc, char **argv)
 	 */
 	cache_open();
 
-	if (num_threads > 1)
-		fork_workers();
-
+	if (cache_fork_workers(progname, num_threads) == 0) {
+		/* We forked, waited, and now need to clean up */
+		cleanup_lockfiles();
+		free_state_path_names(&etab);
+		xlog(L_NOTICE, "%s: no more workers, exiting\n", progname);
+		exit(0);
+	}
 
 	v4clients_init();
 
diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index f9c62cded66c..dbd5546df61c 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -21,7 +21,6 @@
 #include <errno.h>
 #include <fcntl.h>
 #include <sys/resource.h>
-#include <sys/wait.h>
 
 #include "conffile.h"
 #include "xmalloc.h"
@@ -119,90 +118,17 @@ cleanup_lockfiles (void)
 	unlink(rmtab.lockfn);
 }
 
-/* Wait for all worker child processes to exit and reap them */
-static void
-wait_for_workers (void)
-{
-	int status;
-	pid_t pid;
-
-	for (;;) {
-
-		pid = waitpid(0, &status, 0);
-
-		if (pid < 0) {
-			if (errno == ECHILD)
-				return; /* no more children */
-			xlog(L_FATAL, "mountd: can't wait: %s\n",
-					strerror(errno));
-		}
-
-		/* Note: because we SIG_IGN'd SIGCHLD earlier, this
-		 * does not happen on 2.6 kernels, and waitpid() blocks
-		 * until all the children are dead then returns with
-		 * -ECHILD.  But, we don't need to do anything on the
-		 * death of individual workers, so we don't care. */
-		xlog(L_NOTICE, "mountd: reaped child %d, status %d\n",
-				(int)pid, status);
-	}
-}
-
-/* Fork num_threads worker children and wait for them */
-static void
-fork_workers(void)
-{
-	int i;
-	pid_t pid;
-
-	xlog(L_NOTICE, "mountd: starting %d threads\n", num_threads);
-
-	for (i = 0 ; i < num_threads ; i++) {
-		pid = fork();
-		if (pid < 0) {
-			xlog(L_FATAL, "mountd: cannot fork: %s\n",
-					strerror(errno));
-		}
-		if (pid == 0) {
-			/* worker child */
-
-			/* Re-enable the default action on SIGTERM et al
-			 * so that workers die naturally when sent them.
-			 * Only the parent unregisters with pmap and
-			 * hence needs to do special SIGTERM handling. */
-			struct sigaction sa;
-			sa.sa_handler = SIG_DFL;
-			sa.sa_flags = 0;
-			sigemptyset(&sa.sa_mask);
-			sigaction(SIGHUP, &sa, NULL);
-			sigaction(SIGINT, &sa, NULL);
-			sigaction(SIGTERM, &sa, NULL);
-
-			/* fall into my_svc_run in caller */
-			return;
-		}
-	}
-
-	/* in parent */
-	wait_for_workers();
-	unregister_services();
-	cleanup_lockfiles();
-	free_state_path_names(&etab);
-	free_state_path_names(&rmtab);
-	xlog(L_NOTICE, "mountd: no more workers, exiting\n");
-	exit(0);
-}
-
 /*
  * Signal handler.
  */
-static void 
+static void
 killer (int sig)
 {
 	unregister_services();
 	if (num_threads > 1) {
 		/* play Kronos and eat our children */
 		kill(0, SIGTERM);
-		wait_for_workers();
+		cache_wait_for_workers("mountd");
 	}
 	cleanup_lockfiles();
 	free_state_path_names(&etab);
@@ -220,7 +146,7 @@ sig_hup (int UNUSED(sig))
 }
 
 bool_t
-mount_null_1_svc(struct svc_req *rqstp, void *UNUSED(argp), 
+mount_null_1_svc(struct svc_req *rqstp, void *UNUSED(argp),
 	void *UNUSED(resp))
 {
 	struct sockaddr *sap = nfs_getrpccaller(rqstp->rq_xprt);
@@ -922,8 +848,15 @@ main(int argc, char **argv)
 	 */
 	cache_open();
 
-	if (num_threads > 1)
-		fork_workers();
+	if (cache_fork_workers("mountd", num_threads) == 0) {
+		/* We forked, waited, and now need to clean up */
+		unregister_services();
+		cleanup_lockfiles();
+		free_state_path_names(&etab);
+		free_state_path_names(&rmtab);
+		xlog(L_NOTICE, "mountd: no more workers, exiting\n");
+		exit(0);
+	}
 
 	nfsd_path_init();
 	v4clients_init();
diff --git a/utils/mountd/mountd.h b/utils/mountd/mountd.h
index d30775313f66..bd5c9576d8ad 100644
--- a/utils/mountd/mountd.h
+++ b/utils/mountd/mountd.h
@@ -51,13 +51,4 @@ void		mountlist_del(char *host, const char *path);
 void		mountlist_del_all(const struct sockaddr *sap);
 mountlist	mountlist_list(void);
 
-void		cache_open(void);
-struct nfs_fh_len *
-		cache_get_filehandle(nfs_export *exp, int len, char *p);
-int		cache_export(nfs_export *exp, char *path);
-
-bool ipaddr_client_matches(nfs_export *exp, struct addrinfo *ai);
-bool namelist_client_matches(nfs_export *exp, char *dom);
-bool client_matches(nfs_export *exp, char *dom, struct addrinfo *ai);
-
 #endif /* MOUNTD_H */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/6] Share process_loop code between mountd and exportd.
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
                   ` (3 preceding siblings ...)
  2023-10-23  1:58 ` [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c NeilBrown
@ 2023-10-23  1:58 ` NeilBrown
  2023-10-23  1:58 ` [PATCH 6/6] cache: periodically retry requests that couldn't be answered NeilBrown
  2023-10-25 17:37 ` [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh Steve Dickson
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

There is substantial commonality between cache_process_loop() used by
exportd and my_svc_run() used by mountd.

Remove the looping from cache_process_loop() renaming it to
cache_process() and call it in a loop from exportd.
my_svc_run() now calls cache_process() for all the common functionality
and adds code specific to being an RPC server.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 support/export/cache.c  | 49 +++++++++++++++++++++--------------------
 support/export/export.h |  1 +
 utils/exportd/exportd.c |  5 +++--
 utils/mountd/svc_run.c  | 23 ++++---------------
 4 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 1874156af5e5..a01eba4f6619 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -1602,40 +1602,41 @@ int cache_process_req(fd_set *readfds)
 }
 
 /**
- * cache_process_loop - process incoming upcalls
+ * cache_process - process incoming upcalls
+ * Returns -ve on error, or number of fds in svc_fds
+ * that might need processing.
  */
-void cache_process_loop(void)
+int cache_process(fd_set *readfds)
 {
-	fd_set	readfds;
+	fd_set fdset;
 	int	selret;
 
-	FD_ZERO(&readfds);
-
-	for (;;) {
-
-		cache_set_fds(&readfds);
-		v4clients_set_fds(&readfds);
-
-		selret = select(FD_SETSIZE, &readfds,
-				(void *) 0, (void *) 0, (struct timeval *) 0);
+	if (!readfds) {
+		FD_ZERO(&fdset);
+		readfds = &fdset;
+	}
+	cache_set_fds(readfds);
+	v4clients_set_fds(readfds);
 
+	selret = select(FD_SETSIZE, readfds,
+			(void *) 0, (void *) 0, (struct timeval *) 0);
 
-		switch (selret) {
-		case -1:
-			if (errno == EINTR || errno == ECONNREFUSED
-			 || errno == ENETUNREACH || errno == EHOSTUNREACH)
-				continue;
-			xlog(L_ERROR, "my_svc_run() - select: %m");
-			return;
+	switch (selret) {
+	case -1:
+		if (errno == EINTR || errno == ECONNREFUSED
+		    || errno == ENETUNREACH || errno == EHOSTUNREACH)
+			return 0;
+		return -1;
 
-		default:
-			cache_process_req(&readfds);
-			v4clients_process(&readfds);
-		}
+	default:
+		selret -= cache_process_req(readfds);
+		selret -= v4clients_process(readfds);
+		if (selret < 0)
+			selret = 0;
 	}
+	return selret;
 }
 
-
 /*
  * Give IP->domain and domain+path->options to kernel
  * % echo nfsd $IP  $[now+DEFAULT_TTL] $domain > /proc/net/rpc/auth.unix.ip/channel
diff --git a/support/export/export.h b/support/export/export.h
index ce561f9fbd3e..e2009ccdc443 100644
--- a/support/export/export.h
+++ b/support/export/export.h
@@ -31,6 +31,7 @@ struct nfs_fh_len *
 int		cache_export(nfs_export *exp, char *path);
 int		cache_fork_workers(char *prog, int num_threads);
 void		cache_wait_for_workers(char *prog);
+int		cache_process(fd_set *readfds);
 
 bool ipaddr_client_matches(nfs_export *exp, struct addrinfo *ai);
 bool namelist_client_matches(nfs_export *exp, char *dom);
diff --git a/utils/exportd/exportd.c b/utils/exportd/exportd.c
index d07a885c6763..a2e370ac506f 100644
--- a/utils/exportd/exportd.c
+++ b/utils/exportd/exportd.c
@@ -236,9 +236,10 @@ main(int argc, char **argv)
 	v4clients_init();
 
 	/* Process incoming upcalls */
-	cache_process_loop();
+	while (cache_process(NULL) >= 0)
+		;
 
-	xlog(L_ERROR, "%s: process loop terminated unexpectedly. Exiting...\n",
+	xlog(L_ERROR, "%s: process loop terminated unexpectedly(%m). Exiting...\n",
 		progname);
 
 	free_state_path_names(&etab);
diff --git a/utils/mountd/svc_run.c b/utils/mountd/svc_run.c
index 167b9757bde2..2aaf3756bbb1 100644
--- a/utils/mountd/svc_run.c
+++ b/utils/mountd/svc_run.c
@@ -97,28 +97,13 @@ my_svc_run(void)
 	int	selret;
 
 	for (;;) {
-
 		readfds = svc_fdset;
-		cache_set_fds(&readfds);
-		v4clients_set_fds(&readfds);
-
-		selret = select(FD_SETSIZE, &readfds,
-				(void *) 0, (void *) 0, (struct timeval *) 0);
-
-
-		switch (selret) {
-		case -1:
-			if (errno == EINTR || errno == ECONNREFUSED
-			 || errno == ENETUNREACH || errno == EHOSTUNREACH)
-				continue;
+		selret = cache_process(&readfds);
+		if (selret < 0) {
 			xlog(L_ERROR, "my_svc_run() - select: %m");
 			return;
-
-		default:
-			selret -= cache_process_req(&readfds);
-			selret -= v4clients_process(&readfds);
-			if (selret)
-				svc_getreqset(&readfds);
 		}
+		if (selret)
+			svc_getreqset(&readfds);
 	}
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/6] cache: periodically retry requests that couldn't be answered.
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
                   ` (4 preceding siblings ...)
  2023-10-23  1:58 ` [PATCH 5/6] Share process_loop code between mountd and exportd NeilBrown
@ 2023-10-23  1:58 ` NeilBrown
  2023-10-25 17:37 ` [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh Steve Dickson
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2023-10-23  1:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Trond Myklebust

Requests from the kernel to map the fsid from a filehandle to a path
name sometimes cannot be answered because the filesystems isn't
available now but might be available later.

This happens if an export is marked "mountpoint" but the mountpoint
isn't currently mounted.  In this case it might get mounted in the
future.

It also happens in an NFS filesystem is being re-exported and the server
is unresponsive.  In that case (if it was mounted "softerr") we get
ETIMEDOUT from a stat() attempt and so cannot give either a positive or
negative response.

These cases are currently handled poorly.  No answer is returned to the
kernel so it will continue waiting for an answer - and never get one
even if the NFS server comes back or the mountpoint is mounted.

We cannot report a soft error to the kernel so much retry ourselves.

With this patch we record the request when the lookup fails with
dev_missing or similar and retry every 2 minutes.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 support/export/cache.c | 121 +++++++++++++++++++++++++++++++++++------
 1 file changed, 103 insertions(+), 18 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index a01eba4f6619..6c0a44a3a209 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -759,7 +759,15 @@ static struct addrinfo *lookup_client_addr(char *dom)
 	return ret;
 }
 
-static void nfsd_fh(int f)
+#define RETRY_SEC 120
+struct delayed {
+	char *message;
+	time_t last_attempt;
+	int f;
+	struct delayed *next;
+} *delayed;
+
+static int nfsd_handle_fh(int f, char *bp, int blen)
 {
 	/* request are:
 	 *  domain fsidtype fsid
@@ -777,21 +785,13 @@ static void nfsd_fh(int f)
 	nfs_export *exp;
 	int i;
 	int dev_missing = 0;
-	char buf[RPC_CHAN_BUF_SIZE], *bp;
-	int blen;
+	char buf[RPC_CHAN_BUF_SIZE];
 	int did_uncover = 0;
-
-	blen = cache_read(f, buf, sizeof(buf));
-	if (blen <= 0 || buf[blen-1] != '\n') return;
-	buf[blen-1] = 0;
-
-	xlog(D_CALL, "nfsd_fh: inbuf '%s'", buf);
-
-	bp = buf;
+	int ret = 0;
 
 	dom = malloc(blen);
 	if (dom == NULL)
-		return;
+		return ret;
 	if (qword_get(&bp, dom, blen) <= 0)
 		goto out;
 	if (qword_get_int(&bp, &fsidtype) != 0)
@@ -893,8 +893,10 @@ static void nfsd_fh(int f)
 		/* The missing dev could be what we want, so just be
 		 * quiet rather than returning stale yet
 		 */
-		if (dev_missing)
+		if (dev_missing) {
+			ret = 1;
 			goto out;
+		}
 	} else if (found->e_mountpoint &&
 	    !is_mountpoint(found->e_mountpoint[0]?
 			   found->e_mountpoint:
@@ -904,7 +906,7 @@ static void nfsd_fh(int f)
 		   xlog(L_WARNING, "%s not exported as %d not a mountpoint",
 		   found->e_path, found->e_mountpoint);
 		 */
-		/* FIXME we need to make sure we re-visit this later */
+		ret = 1;
 		goto out;
 	}
 
@@ -933,7 +935,68 @@ out:
 		free(found_path);
 	nfs_freeaddrinfo(ai);
 	free(dom);
-	xlog(D_CALL, "nfsd_fh: found %p path %s", found, found ? found->e_path : NULL);
+	if (!ret)
+		xlog(D_CALL, "nfsd_fh: found %p path %s",
+		     found, found ? found->e_path : NULL);
+	return ret;
+}
+
+static void nfsd_fh(int f)
+{
+	struct delayed *d, **dp;
+	char inbuf[RPC_CHAN_BUF_SIZE];
+	int blen;
+
+	blen = cache_read(f, inbuf, sizeof(inbuf));
+	if (blen <= 0 || inbuf[blen-1] != '\n') return;
+	inbuf[blen-1] = 0;
+
+	xlog(D_CALL, "nfsd_fh: inbuf '%s'", inbuf);
+
+	if (nfsd_handle_fh(f, inbuf, blen) == 0)
+		return;
+	/* We don't have a definitive answer to give the kernel.
+	 * This is because an export marked "mountpoint" isn't a
+	 * mountpoint, or because a stat of a mountpoint fails with
+	 * a strange error like ETIMEDOUT as is possible with an
+	 * NFS mount marked "softerr" which is being re-exported.
+	 *
+	 * We cannot tell the kernel to retry, so we have to
+	 * retry ourselves.
+	 */
+	d = malloc(sizeof(*d));
+
+	if (!d)
+		return;
+	d->message = strndup(inbuf, blen);
+	if (!d->message) {
+		free(d);
+		return;
+	}
+	d->f = f;
+	d->last_attempt = time(NULL);
+	d->next = NULL;
+	dp = &delayed;
+	while (*dp)
+		dp = &(*dp)->next;
+	*dp = d;
+}
+
+static void nfsd_retry_fh(struct delayed *d)
+{
+	struct delayed **dp;
+
+	if (nfsd_handle_fh(d->f, d->message, strlen(d->message)+1) == 0) {
+		free(d->message);
+		free(d);
+		return;
+	}
+	d->last_attempt = time(NULL);
+	d->next = NULL;
+	dp = &delayed;
+	while (*dp)
+		dp = &(*dp)->next;
+	*dp = d;
 }
 
 #ifdef HAVE_JUNCTION_SUPPORT
@@ -1512,7 +1575,7 @@ static void nfsd_export(int f)
 			 * This will cause it not to appear in the V4 Pseudo-root
 			 * and so a "mount" of this path will fail, just like with
 			 * V3.
-			 * And filehandle for this mountpoint from an earlier
+			 * Any filehandle for this mountpoint from an earlier
 			 * mount will block in nfsd.fh lookup.
 			 */
 			xlog(L_WARNING,
@@ -1610,6 +1673,7 @@ int cache_process(fd_set *readfds)
 {
 	fd_set fdset;
 	int	selret;
+	struct timeval tv = { 24*3600, 0 };
 
 	if (!readfds) {
 		FD_ZERO(&fdset);
@@ -1618,8 +1682,29 @@ int cache_process(fd_set *readfds)
 	cache_set_fds(readfds);
 	v4clients_set_fds(readfds);
 
-	selret = select(FD_SETSIZE, readfds,
-			(void *) 0, (void *) 0, (struct timeval *) 0);
+	if (delayed) {
+		time_t now = time(NULL);
+		time_t delay;
+		if (delayed->last_attempt > now)
+			/* Clock updated - retry immediately */
+			delayed->last_attempt = now - RETRY_SEC;
+		delay = delayed->last_attempt + RETRY_SEC - now;
+		if (delay < 0)
+			delay = 0;
+		tv.tv_sec = delay;
+	}
+	selret = select(FD_SETSIZE, readfds, NULL, NULL, &tv);
+
+	if (delayed) {
+		time_t now = time(NULL);
+		struct delayed *d = delayed;
+
+		if (d->last_attempt + RETRY_SEC <= now) {
+			delayed = d->next;
+			d->next = NULL;
+			nfsd_retry_fh(d);
+		}
+	}
 
 	switch (selret) {
 	case -1:
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh
  2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
                   ` (5 preceding siblings ...)
  2023-10-23  1:58 ` [PATCH 6/6] cache: periodically retry requests that couldn't be answered NeilBrown
@ 2023-10-25 17:37 ` Steve Dickson
  6 siblings, 0 replies; 8+ messages in thread
From: Steve Dickson @ 2023-10-25 17:37 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs, Trond Myklebust



On 10/22/23 9:58 PM, NeilBrown wrote:
> Hi,
>   this is a revised version of my previous series with the same name.
>   This first two patches are unchanged.
>   The third patch, which was an RFC, has been replaced with the last
>   patch which actually addresses the issue rather than skirting
>   around it.
> 
>   Patch 3 here is a revert of a change I noticed while exploring the
>   code.  cache_open() must be called BEFORE forking workers, as explained
>   in that patch.
>   Patches 4 and 5 factor our common code which makes the final patch
>   simpler.
> 
>   The core issue is that sometimes mountd (or exportd) cannot give a
>   definitey "yes" or "no" to a request to map an fsid to a path name.
>   In these cases the only safe option is to delay and try again.
> 
>   This only becomes relevant if a filesystem is mounted by a client, then
>   the server restarts (or the export cache is flushed) and the client
>   tries to use a filehandle that it already has, but that server cannot
>   find it and cannot be sure it doesn't exist.  This can happen when an
>   export is marked "mountpoint" or when a re-exported NFS filesystem
>   cannot contact the server and reports an ETIMEDOUT error.  In these
>   cases we want the client to continue waiting (which it does) and also
>   want mountd/exportd to periodically check if the target filesystem has
>   come back (which it currently does not).
>   With the current code, once this situation happens and the client is
>   waiting, the client will continue to wait indefintely even if the
>   target filesytem becomes available.  The client can only continue if
>   the NFS server is restarted or the export cache is flushed.  After the
>   ptsch, then within 2 minutes of the target filesystem becoming
>   available again, mountd will tell the kernel and when the client asks
>   again it will get be allowed to proceed.
> 
> NeilBrown
> 
> 
>   [PATCH 1/6] export: fix handling of error from match_fsid()
>   [PATCH 2/6] export: add EACCES to the list of known
>   [PATCH 3/6] export: move cache_open() before workers are forked.
>   [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
>   [PATCH 5/6] Share process_loop code between mountd and exportd.
>   [PATCH 6/6] cache: periodically retry requests that couldn't be
> 
Committed... (tag: nfs-utils-2-6-4-rc5)

steved.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-25 17:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-23  1:58 [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh NeilBrown
2023-10-23  1:58 ` [PATCH 1/6] export: fix handling of error from match_fsid() NeilBrown
2023-10-23  1:58 ` [PATCH 2/6] export: add EACCES to the list of known path_lookup_error() errors NeilBrown
2023-10-23  1:58 ` [PATCH 3/6] export: move cache_open() before workers are forked NeilBrown
2023-10-23  1:58 ` [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c NeilBrown
2023-10-23  1:58 ` [PATCH 5/6] Share process_loop code between mountd and exportd NeilBrown
2023-10-23  1:58 ` [PATCH 6/6] cache: periodically retry requests that couldn't be answered NeilBrown
2023-10-25 17:37 ` [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh Steve Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox