All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: NSM lock recovery fails too often
@ 2004-03-09 14:15 Lever, Charles
  2004-03-09 14:22 ` Olaf Kirch
  0 siblings, 1 reply; 10+ messages in thread
From: Lever, Charles @ 2004-03-09 14:15 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: nfs

hi olaf-

> Here's the promised sm-notify utility.

looks clean!

i thought the purpose of moving NSM into the kernel was
to eliminate the need for any user-level programs because
sometimes they don't get run at reboot..?

one of the issues we've observed with reboot notification
is that it fails entirely if the network stack on the client
hasn't been initialized when NSM starts.  for example, if a
client is DHCP configured, and the DHCP service is slow,
the NSM reboot notification will run and fail before the
client's network has been configured.  it also means that
the success of NSM reboot recovery is entirely dependent
on exactly when this program is run during start up.

how does your recovery program handle this case?


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread
* RE: NSM lock recovery fails too often
@ 2004-03-12 16:47 Lever, Charles
  0 siblings, 0 replies; 10+ messages in thread
From: Lever, Charles @ 2004-03-12 16:47 UTC (permalink / raw)
  To: nfs

hi all-

so we need two solutions:  one for legacy 2.4 and one going forward
on 2.6.

for 2.4, i'd like to see one of the statd patch alternatives
incorporated into the nfs-utils distribution as soon as possible
so that all Linux distributors can pick up an approved patch for
their 2.4-based distributions.  which alternative is the favorite?
this should be addressed quickly.

for 2.6, olaf's work has promise.


> -----Original Message-----
> From: Olaf Kirch [mailto:okir@suse.de]=20
> Sent: Tuesday, March 09, 2004 7:59 AM
> To: Trond Myklebust
> Cc: Lever, Charles; nfs@lists.sourceforge.net
> Subject: Re: [NFS] NSM lock recovery fails too often
>=20
>=20
> On Tue, Mar 09, 2004 at 10:47:42AM -0500, Trond Myklebust wrote:
> > P=E5 ty , 09/03/2004 klokka 10:10, skreiv Olaf Kirch:
> > > What I was referring to was the ability to have several=20
> RPC programs=20
> > > on a single svc_sock server side; e.g. NFS and NFSACL, or NLM and=20
> > > NSM. Pretty much the way svc_register from the good ole=20
> sunrpc code=20
> > > works.
> >=20
> > Oh, sorry...
> >=20
> > Hmm... Any reason why you couldn't put them all on port=20
> 2049 with this=20
> > patch? Would that be less efficient than having two sets of threads?
>=20
> No, you could put all of them on the same port. The problem=20
> is that you want lockd and nfsd to be two different=20
> processes, so you can shut down nfsd without disrupting lockd.
>=20
> Olaf
> --=20
> Olaf Kirch     |  Stop wasting entropy - start using predictable
> okir@suse.de   |  tempfile names today!
> ---------------+=20
>=20


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread
* NSM lock recovery fails too often
@ 2004-03-09  4:30 Lever, Charles
  2004-03-09 10:56 ` Olaf Kirch
  0 siblings, 1 reply; 10+ messages in thread
From: Lever, Charles @ 2004-03-09  4:30 UTC (permalink / raw)
  To: nfs; +Cc: Olaf Kirch


[-- Attachment #1.1: Type: text/plain, Size: 1414 bytes --]

the way things work today, NLM is in-kernel on most Linux systems,
and it uses an in-kernel equivalent of gethostname(3) to determine
the client's hostname for use when making NLM requests.  NSM,
though, is still in user-land, and uses gethostbyname(3) to
determine the client's hostname.  very often this results in
NSM using a different client hostname string than NLM, thus
causing lock recovery to fail.

NLM and NSM must use the same hostname string.

this is a real bug that many of NetApp's customers hit all
the time.  the problem is exposed only after a client crashes
and recovers, not when it shuts down normally and reboots.

i attach two patches that accomplish a solution in different
ways.

first is a patch by Olaf Kirch against nfs-utils-1.0.1 that
adds an option to disable the extra gethostbyname(3) call in
rpc.statd.  second is a reductionist approach -- just excise
that call entirely.  the first patch allows backwards compat-
ibility with the user-level lockd, which nfs-utils still
contains.  the second makes rpc.statd match the behavior of
the in-kernel lockd unconditionally.

perhaps the best solution is to use an option as Olaf's patch
does, but to make the default behavior match the in-kernel
lockd's behavior, not the user-level lockd's behavior.  or,
maybe we use the second patch and simply remove the user
level lockd from nfs-utils.

comments?

[-- Attachment #1.2: Type: text/html, Size: 2488 bytes --]

[-- Attachment #2: nfs-utils-1.0.1-local-hostname.patch --]
[-- Type: TEXT/PLAIN, Size: 2981 bytes --]

diff -ur nfs-utils-1.0.1/utils/statd/statd.c nfs-utils-1.0.1.local-name/utils/statd/statd.c
--- nfs-utils-1.0.1/utils/statd/statd.c	2004-02-13 16:37:35.000000000 +0100
+++ nfs-utils-1.0.1.local-name/utils/statd/statd.c	2004-02-13 16:37:06.000000000 +0100
@@ -27,6 +27,7 @@
 
 short int restart = 0;
 int	run_mode = 0;		/* foreground logging mode */
+int	use_local_hostname = 0;
 
 /* LH - I had these local to main, but it seemed silly to have 
  * two copies of each - one in main(), one static in log.c... 
@@ -43,6 +44,7 @@
 	{ "outgoing-port", 1, 0, 'o' },
 	{ "port", 1, 0, 'p' },
 	{ "name", 1, 0, 'n' },
+	{ "use-local-hostname", 1, 0, 'l' },
 	{ NULL, 0, 0, 0 }
 };
 
@@ -124,6 +126,8 @@
 	fprintf(stderr,"      -h, -?, --help       Print this help screen.\n");
 	fprintf(stderr,"      -F, --foreground     Foreground (no-daemon mode)\n");
 	fprintf(stderr,"      -d, --no-syslog      Verbose logging to stderr.  Foreground mode only.\n");
+	fprintf(stderr,"      -l, --use-local-hostname\n"
+	               "                           Don't add a domain to the hostname in NOTIFY calls\n");
 	fprintf(stderr,"      -p, --port           Port to listen on\n");
 	fprintf(stderr,"      -o, --outgoing-port  Port for outgoing connections\n");
 	fprintf(stderr,"      -V, -v, --version    Display version information and exit.\n");
@@ -161,7 +165,7 @@
 	MY_NAME = NULL;
 
 	/* Process command line switches */
-	while ((arg = getopt_long(argc, argv, "h?vVFdn:p:o:", longopts, NULL)) != EOF) {
+	while ((arg = getopt_long(argc, argv, "h?vVFdln:p:o:", longopts, NULL)) != EOF) {
 		switch (arg) {
 		case 'V':	/* Version */
 		case 'v':
@@ -191,6 +195,9 @@
 				exit(1);
 			}
 			break;
+		case 'l':
+			use_local_hostname = 1;
+			break;
 		case 'n':	/* Specify local hostname */
 			MY_NAME = xstrdup(optarg);
 			break;
diff -ur nfs-utils-1.0.1/utils/statd/statd.h nfs-utils-1.0.1.local-name/utils/statd/statd.h
--- nfs-utils-1.0.1/utils/statd/statd.h	2000-10-05 21:11:39.000000000 +0200
+++ nfs-utils-1.0.1.local-name/utils/statd/statd.h	2004-02-13 16:33:53.000000000 +0100
@@ -48,6 +48,7 @@
 stat_chge		SM_stat_chge;
 #define MY_NAME		SM_stat_chge.mon_name
 #define MY_STATE	SM_stat_chge.state
+extern int		use_local_hostname;
 
 /*
  * Some timeout values.  (Timeout values are in whole seconds.)
diff -ur nfs-utils-1.0.1/utils/statd/state.c nfs-utils-1.0.1.local-name/utils/statd/state.c
--- nfs-utils-1.0.1/utils/statd/state.c	2004-02-13 16:37:35.000000000 +0100
+++ nfs-utils-1.0.1.local-name/utils/statd/state.c	2004-02-13 16:35:29.000000000 +0100
@@ -64,6 +64,11 @@
     if (gethostname (fullhost, SM_MAXSTRLEN) == -1)
       die ("gethostname: %s", strerror (errno));
 
+    if (use_local_hostname) {
+      MY_NAME = xstrdup (fullhost);
+      return;
+    }
+
     if ((hostinfo = gethostbyname (fullhost)) == NULL)
       log (L_ERROR, "gethostbyname error for %s", fullhost);
     else {

[-- Attachment #3: nfs-utils-1.0.6-no-ghbn.patch --]
[-- Type: TEXT/PLAIN, Size: 703 bytes --]

diff -Naurp nfs-utils-1.0.6/utils/statd/state.c nfs-utils-1.0.6-fix/utils/statd/state.c
--- nfs-utils-1.0.6/utils/statd/state.c	2003-09-12 01:41:40.000000000 -0400
+++ nfs-utils-1.0.6-fix/utils/statd/state.c	2004-03-08 22:58:40.000000000 -0500
@@ -63,13 +63,6 @@ change_state (void)
     if (gethostname (fullhost, SM_MAXSTRLEN) == -1)
       die ("gethostname: %s", strerror (errno));
 
-    if ((hostinfo = gethostbyname (fullhost)) == NULL)
-      note (N_ERROR, "gethostbyname error for %s", fullhost);
-    else {
-      strncpy (fullhost, hostinfo->h_name, sizeof (fullhost) - 1);
-      fullhost[sizeof (fullhost) - 1] = '\0';
-    }
-
     MY_NAME = xstrdup (fullhost);
   }
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-03-12 16:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-09 14:15 NSM lock recovery fails too often Lever, Charles
2004-03-09 14:22 ` Olaf Kirch
2004-03-09 15:04   ` Trond Myklebust
2004-03-09 15:10     ` Olaf Kirch
2004-03-09 15:47       ` Trond Myklebust
2004-03-09 15:59         ` Olaf Kirch
  -- strict thread matches above, loose matches on Subject: below --
2004-03-12 16:47 Lever, Charles
2004-03-09  4:30 Lever, Charles
2004-03-09 10:56 ` Olaf Kirch
2004-03-09 10:57   ` Olaf Kirch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.