All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsd random drop
@ 2004-04-01 10:23 Olaf Kirch
  2004-04-01 16:14 ` "Peter Lojkin" 
  2004-04-05  0:13 ` Neil Brown
  0 siblings, 2 replies; 5+ messages in thread
From: Olaf Kirch @ 2004-04-01 10:23 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 3958 bytes --]

Hi,

I hate to bore you all with the same old stuff, but I'm still fighting
problems caused by nfsd's dropping active connections.

The most recent episode in this saga is a problem with the Linux
client.

Consider a network with a single Linux 2.4 based home server, a few
hundred clients, all using TCP. In Linux 2.4, nfsd starts dropping
connections when it reaches a limit of (nrthreads + 3) * 10 open
connections. With 4 threads, this means 70 connections, and with 8 threads
this means 110 connections max. Both of which is totally inadequate for
this network. To get out of the congestion zone, we would need to bump
the number of threads to about 20, which is just silly.

The very same network has been served well with just 4 threads all
the time while using UDP.

With the 2.6 kernel, things get even worse as the formula was changed to
(nrthreads + 3) * 5, so you'll max out at 35 (4 threads) and 55 (with
8 threads), respectively. To serve 200 mounts via TCP simultaneously,
you'd need close to 40 nfsd threads.

In theory, all clients should be able to cope gracefully with such drops,
but even the Linux client runs into a couple of SNAFUs with these.

One: with a 50% probability, nfsd decides to drop the _newest_ connection,
which is the one it just accepted.  When the Linux client sees a fresh
connection go down before it was able to send anything across, it
backs off for 15 to 60 seconds, hanging the NFS mount (with 2.6.5-pre,
it's always 60 seconds). Which is kind of annoying the KDE users here,
because KDE applications like to scribble to the home directory all
the time, and their entire session freezes when NFS hangs.

Second: People have reported that files vanished and/or rename/remove
operations failed.

I also think this is due to the TCP disconnects. What I think
is happening here is this:

 -      user X: unlink("blafoo") 
 -      kernel: sends NFS call to server REMOVE "blafoo" 
 -      nfsd thread A receives request, removes file blafoo. waits for 
	some file system i/o to sync the change to disk 
 -      a new tcp connection comes in. Another nfsd thread B decides 
	it needs to nuke some connections, selects user X's connection 
 -      nfsd thread A decides it should send the response now,
	but finds the socket is gone. Drops the reply.
 -      client kernel: reconnect to NFS server
 -	server drops connection
 -	client waits for a while, reconnects again,
	resends REMOVE "blafoo" 
 -      NFS server: sorry, ENOENT: there's no such file "blafoo" 

Normally, the NFS server's replay cache should protect from this sort
of behavior, but the long timeouts before the client can reconnect
effectively mean the cached reply has been forgotten by the time the
retransmitted call arrives.

This is not a theoretical case; users here have reported that
files vanish mysteriously several times a day.

Three: people reported lots of messages in their syslog saying
"nfs_rename: target foo/bar busy, d_count=2". This is a variation
of the above. nfs_rename finds that someone still has foo/bar
open and decides it needs to do a sillyrename. The rename
fails with the spurious ENOENT error described above, causing
the entire rename operation to fail

Four: Some buggy clients can't deal with it, but I think I mentioned
that already.  Prime offender is zOS; when a fresh connection is killed,
it simply propagates the error to the application, hard mount or not. I
know it's broken, but that doesn't mean we can't be gentler and make
these clients work more smoothly with Linux.

I propose to add the following two patches to the server and client. They
increase the connection limit, stop dropping the neweset socket, and
add some printk's to alert the admin of the contention.

As an alternative to hardcoding a formula based on the number of threads,
I could also make the max number of connections a sysctl.

Comments,
Olaf
-- 
Olaf Kirch     |  The Hardware Gods hate me.
okir@suse.de   |
---------------+ 

[-- Attachment #2: sunrpc-svcsock-drop --]
[-- Type: text/plain, Size: 1848 bytes --]

diff -ur linux-2.6.4-nfsd/net/sunrpc/svcsock.c linux-2.6.4/net/sunrpc/svcsock.c
--- linux-2.6.4-nfsd/net/sunrpc/svcsock.c	2004-03-11 03:55:22.000000000 +0100
+++ linux-2.6.4/net/sunrpc/svcsock.c	2004-03-30 16:58:01.000000000 +0200
@@ -828,21 +828,33 @@
 
 	/* make sure that we don't have too many active connections.
 	 * If we have, something must be dropped.
-	 * We randomly choose between newest and oldest (in terms
-	 * of recent activity) and drop it.
+	 *
+	 * There's no point in trying to do random drop here for
+	 * DoS prevention. The NFS clients does 1 reconnect in 15
+	 * seconds. An attacker can easily beat that.
+	 *
+	 * The only somewhat efficient mechanism would be to drop
+	 * old connections from the same IP first. But right now
+	 * we don't even record the client IP in svc_sock.
 	 */
-	if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*5) {
+	if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
 		struct svc_sock *svsk = NULL;
 		spin_lock_bh(&serv->sv_lock);
 		if (!list_empty(&serv->sv_tempsocks)) {
-			if (net_random()&1)
-				svsk = list_entry(serv->sv_tempsocks.prev,
-						  struct svc_sock,
-						  sk_list);
-			else
-				svsk = list_entry(serv->sv_tempsocks.next,
-						  struct svc_sock,
-						  sk_list);
+			if (net_ratelimit()) {
+				/* Try to help the admin */
+				printk(KERN_NOTICE "%s: too many open TCP sockets, consider "
+						   "increasing the number of threads\n",
+						   serv->sv_name);
+				printk(KERN_NOTICE "%s: last TCP connect from %u.%u.%u.%u:%d\n",
+							serv->sv_name, 
+							NIPQUAD(sin.sin_addr.s_addr),
+							ntohs(sin.sin_port));
+			}
+			/* Always select the oldest socket. It's not fair, but so is life */
+			svsk = list_entry(serv->sv_tempsocks.prev,
+					  struct svc_sock,
+					  sk_list);
 			set_bit(SK_CLOSE, &svsk->sk_flags);
 			svsk->sk_inuse ++;
 		}

[-- Attachment #3: sunrpc-verbose-disconnect --]
[-- Type: text/plain, Size: 445 bytes --]

--- linux-2.6.4/net/sunrpc/xprt.c.reconnect	2004-03-30 14:19:45.000000000 +0200
+++ linux-2.6.4/net/sunrpc/xprt.c	2004-03-30 15:42:04.000000000 +0200
@@ -1039,6 +1039,11 @@
 	case TCP_SYN_RECV:
 		break;
 	default:
+		if (net_ratelimit()) {
+			printk(KERN_NOTICE "NFS server %u.%u.%u.%u %s connection\n",
+					NIPQUAD(xprt->addr.sin_addr.s_addr),
+					xprt_connected(xprt)? "closed" : "refused");
+		}
 		xprt_disconnect(xprt);
 		break;
 	}

^ permalink raw reply	[flat|nested] 5+ messages in thread
* RE: nfsd random drop
@ 2004-04-01 15:19 Lever, Charles
  0 siblings, 0 replies; 5+ messages in thread
From: Lever, Charles @ 2004-04-01 15:19 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: nfs

quick comment --

i think a shorter wait in the client before attempting to
reconnect would improve the likelihood that your completed
but unreplied operations would still be in the server's
replay cache.

that might be a good additional change (if you haven't
suggested that already).

> -----Original Message-----
> From: Olaf Kirch [mailto:okir@suse.de]=20
> Sent: Thursday, April 01, 2004 5:24 AM
> To: Neil Brown
> Cc: nfs@lists.sourceforge.net
> Subject: [NFS] nfsd random drop
>=20
>=20
> Hi,
>=20
> I hate to bore you all with the same old stuff, but I'm still fighting
> problems caused by nfsd's dropping active connections.
>=20
> The most recent episode in this saga is a problem with the Linux
> client.
>=20
> Consider a network with a single Linux 2.4 based home server, a few
> hundred clients, all using TCP. In Linux 2.4, nfsd starts dropping
> connections when it reaches a limit of (nrthreads + 3) * 10 open
> connections. With 4 threads, this means 70 connections, and=20
> with 8 threads
> this means 110 connections max. Both of which is totally=20
> inadequate for
> this network. To get out of the congestion zone, we would need to bump
> the number of threads to about 20, which is just silly.
>=20
> The very same network has been served well with just 4 threads all
> the time while using UDP.
>=20
> With the 2.6 kernel, things get even worse as the formula was=20
> changed to
> (nrthreads + 3) * 5, so you'll max out at 35 (4 threads) and 55 (with
> 8 threads), respectively. To serve 200 mounts via TCP simultaneously,
> you'd need close to 40 nfsd threads.
>=20
> In theory, all clients should be able to cope gracefully with=20
> such drops,
> but even the Linux client runs into a couple of SNAFUs with these.
>=20
> One: with a 50% probability, nfsd decides to drop the=20
> _newest_ connection,
> which is the one it just accepted.  When the Linux client sees a fresh
> connection go down before it was able to send anything across, it
> backs off for 15 to 60 seconds, hanging the NFS mount (with 2.6.5-pre,
> it's always 60 seconds). Which is kind of annoying the KDE users here,
> because KDE applications like to scribble to the home directory all
> the time, and their entire session freezes when NFS hangs.
>=20
> Second: People have reported that files vanished and/or rename/remove
> operations failed.
>=20
> I also think this is due to the TCP disconnects. What I think
> is happening here is this:
>=20
>  -      user X: unlink("blafoo")=20
>  -      kernel: sends NFS call to server REMOVE "blafoo"=20
>  -      nfsd thread A receives request, removes file blafoo.=20
> waits for=20
> 	some file system i/o to sync the change to disk=20
>  -      a new tcp connection comes in. Another nfsd thread B decides=20
> 	it needs to nuke some connections, selects user X's connection=20
>  -      nfsd thread A decides it should send the response now,
> 	but finds the socket is gone. Drops the reply.
>  -      client kernel: reconnect to NFS server
>  -	server drops connection
>  -	client waits for a while, reconnects again,
> 	resends REMOVE "blafoo"=20
>  -      NFS server: sorry, ENOENT: there's no such file "blafoo"=20
>=20
> Normally, the NFS server's replay cache should protect from this sort
> of behavior, but the long timeouts before the client can reconnect
> effectively mean the cached reply has been forgotten by the time the
> retransmitted call arrives.
>=20
> This is not a theoretical case; users here have reported that
> files vanish mysteriously several times a day.
>=20
> Three: people reported lots of messages in their syslog saying
> "nfs_rename: target foo/bar busy, d_count=3D2". This is a variation
> of the above. nfs_rename finds that someone still has foo/bar
> open and decides it needs to do a sillyrename. The rename
> fails with the spurious ENOENT error described above, causing
> the entire rename operation to fail
>=20
> Four: Some buggy clients can't deal with it, but I think I mentioned
> that already.  Prime offender is zOS; when a fresh connection=20
> is killed,
> it simply propagates the error to the application, hard mount=20
> or not. I
> know it's broken, but that doesn't mean we can't be gentler and make
> these clients work more smoothly with Linux.
>=20
> I propose to add the following two patches to the server and=20
> client. They
> increase the connection limit, stop dropping the neweset socket, and
> add some printk's to alert the admin of the contention.
>=20
> As an alternative to hardcoding a formula based on the number=20
> of threads,
> I could also make the max number of connections a sysctl.
>=20
> Comments,
> Olaf
> --=20
> Olaf Kirch     |  The Hardware Gods hate me.
> okir@suse.de   |
> ---------------+=20
>=20


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-04-05  1:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-01 10:23 nfsd random drop Olaf Kirch
2004-04-01 16:14 ` "Peter Lojkin" 
2004-04-05  0:13 ` Neil Brown
2004-04-05  1:09   ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2004-04-01 15:19 Lever, Charles

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.