* 2.6.5-pre TCP connect problems
@ 2004-03-29 13:50 Olaf Kirch
2004-03-29 15:28 ` Olaf Kirch
0 siblings, 1 reply; 2+ messages in thread
From: Olaf Kirch @ 2004-03-29 13:50 UTC (permalink / raw)
To: nfs
Hi,
I'm currently debugging a problem with TCP reconnects in 2.6.5-pre where
the TCP reconnect code got rewritten to use worker queues. What happens
is that the NFS server drops the connection immediately and that state
change isn't propagated to the transport. The typical sequence of
events goes like this
Mar 29 15:44:00 e36 kernel: RPC: 28611 call_transmit (status 0)
Mar 29 15:44:00 e36 kernel: RPC: 28611 xprt_prepare_transmit
Mar 29 15:44:00 e36 kernel: RPC: 28611 call_status (status -107)
Mar 29 15:44:00 e36 kernel: RPC: 28611 call_bind xprt d42c4200 is not connected
Mar 29 15:44:00 e36 kernel: RPC: 28611 call_connect status 0
Mar 29 15:44:00 e36 kernel: RPC: 28611 xprt_connect xprt d42c4200 is not connected
Mar 29 15:44:00 e36 kernel: RPC: disconnected transport d42c4200
Mar 29 15:44:00 e36 kernel: RPC: xprt_create_socket(tcp 6)
Mar 29 15:44:00 e36 kernel: RPC: d42c4200 connect status 115 connected 0 sock state 2
Mar 29 15:44:00 e36 kernel: RPC: tcp_state_change client d42c4200...
Mar 29 15:44:00 e36 kernel: RPC: state 1 conn 0 dead 0 zapped 0
Mar 29 15:44:00 e36 kernel: RPC: tcp_state_change client d42c4200...
Mar 29 15:44:00 e36 kernel: RPC: state 8 conn 1 dead 0 zapped 0
Mar 29 15:44:00 e36 kernel: RPC: disconnected transport d42c4200
Mar 29 15:44:00 e36 kernel: RPC: tcp_data_ready...
Mar 29 15:44:15 e36 kernel: RPC: 28611 call_transmit (status 0)
Mar 29 15:44:15 e36 kernel: RPC: 28611 xprt_prepare_transmit
Mar 29 15:44:15 e36 kernel: RPC: 28611 call_status (status -107)
Does this sound familiar to anyone?
Olaf
--
Olaf Kirch | The Hardware Gods hate me.
okir@suse.de |
---------------+
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: 2.6.5-pre TCP connect problems
2004-03-29 13:50 2.6.5-pre TCP connect problems Olaf Kirch
@ 2004-03-29 15:28 ` Olaf Kirch
0 siblings, 0 replies; 2+ messages in thread
From: Olaf Kirch @ 2004-03-29 15:28 UTC (permalink / raw)
To: nfs
On Mon, Mar 29, 2004 at 03:50:42PM +0200, Olaf Kirch wrote:
> I'm currently debugging a problem with TCP reconnects in 2.6.5-pre where
> the TCP reconnect code got rewritten to use worker queues. What happens
> is that the NFS server drops the connection immediately and that state
> change isn't propagated to the transport.
I debugged this a little more. The problem I'be been seeing was
caused by too many TCP connections. The NFS server was dropping
connections randomly. Randomly means that the newest connection
will be dropped with a probability of 50%, so that the connection
dies before the client has sent the first packet. This causes
the client to back off for 60 seconds.
I'm not sure why this effect wasn't visible with 2.6.4, but it
seems it used a lower timeout (REESTABLISH_TIMEOUT = 15sec)
when the connection was refused or dropped instantly, and may
have been less noticeable therefore.
I'm not sure if it's a good idea to be more aggressive about
reconnecting, but I think the client should at least log
a message to syslog that a connection attempt failed. Likewise,
the server should probably log a message when it finds it's
dropping too many TCP connections.
Finally, I think the way nfsd drops connections is bad. Dropping
the most recent connection doesn't prevent DoS, and as this example
demonstrates, it does unexpected things to your clients.
Olaf
--
Olaf Kirch | The Hardware Gods hate me.
okir@suse.de |
---------------+
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-03-29 15:28 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-29 13:50 2.6.5-pre TCP connect problems Olaf Kirch
2004-03-29 15:28 ` Olaf Kirch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.