re: [PATCH] new timeout behavior for RPC requests on TCP sockets

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* re: [PATCH] new timeout behavior for RPC requests on TCP sockets
@ 2002-11-12 23:48 Dan Kegel
  2002-11-13 15:58 ` Chuck Lever
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Kegel @ 2002-11-12 23:48 UTC (permalink / raw)
  To: Chuck Lever, Linux Kernel Mailing List

Chuck wrote:
> make RPC timeout behavior over TCP sockets behave more like reference
> client implementations.  reference behavior is to transmit the same
> request three times at 60 second intervals; if there is no response, close
> and reestablish the socket connection.  we modify the Linux RPC client as
> follows:
> 
> +  after a minor retransmit timeout, use the same timeout value when
>    retrying on a TCP socket rather than doubling the value
> +  after a major retransmit timeout, close the socket and attempt
>    to reestablish a fresh TCP connection
> 
> note that today mount uses a 6 second timeout with 5 retries for NFS over
> TCP by default; proper default behavior is 2 retries each with 60 second
> timeouts.  a separate patch for mount is pending.

Chuck, can you briefly explain why RPC does any minor
retransmits at all over TCP?
Shouldn't TCP's natural retransmit take care of that?
- Dan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-12 23:48 [PATCH] new timeout behavior for RPC requests on TCP sockets Dan Kegel
@ 2002-11-13 15:58 ` Chuck Lever
  2002-11-13 16:44   ` Richard B. Johnson
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2002-11-13 15:58 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Linux Kernel Mailing List

On Tue, 12 Nov 2002, Dan Kegel wrote:

> Chuck wrote:
> > make RPC timeout behavior over TCP sockets behave more like reference
> > client implementations.  reference behavior is to transmit the same
> > request three times at 60 second intervals; if there is no response, close
> > and reestablish the socket connection.  we modify the Linux RPC client as
> > follows:
> >
> > +  after a minor retransmit timeout, use the same timeout value when
> >    retrying on a TCP socket rather than doubling the value
> > +  after a major retransmit timeout, close the socket and attempt
> >    to reestablish a fresh TCP connection
> >
> > note that today mount uses a 6 second timeout with 5 retries for NFS over
> > TCP by default; proper default behavior is 2 retries each with 60 second
> > timeouts.  a separate patch for mount is pending.
>
> Chuck, can you briefly explain why RPC does any minor
> retransmits at all over TCP?
> Shouldn't TCP's natural retransmit take care of that?

the socket layer guarantees delivery only to the RPC server application...
if the application itself chooses to drop the request, an RPC retransmit
is still required.

	- Chuck Lever
--
corporate:	<cel at netapp dot com>
personal:	<chucklever at bigfoot dot com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-13 15:58 ` Chuck Lever
@ 2002-11-13 16:44   ` Richard B. Johnson
  2002-11-13 16:49     ` Trond Myklebust
  2002-11-13 17:42     ` Alan Cox
  0 siblings, 2 replies; 15+ messages in thread
From: Richard B. Johnson @ 2002-11-13 16:44 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Dan Kegel, Linux Kernel Mailing List

On Wed, 13 Nov 2002, Chuck Lever wrote:

> On Tue, 12 Nov 2002, Dan Kegel wrote:
> 
> > Chuck wrote:
> > > make RPC timeout behavior over TCP sockets behave more like reference
> > > client implementations.  reference behavior is to transmit the same
> > > request three times at 60 second intervals; if there is no response, close
> > > and reestablish the socket connection.  we modify the Linux RPC client as
> > > follows:
> > >
> > > +  after a minor retransmit timeout, use the same timeout value when
> > >    retrying on a TCP socket rather than doubling the value
> > > +  after a major retransmit timeout, close the socket and attempt
> > >    to reestablish a fresh TCP connection
> > >
> > > note that today mount uses a 6 second timeout with 5 retries for NFS over
> > > TCP by default; proper default behavior is 2 retries each with 60 second
> > > timeouts.  a separate patch for mount is pending.
> >
> > Chuck, can you briefly explain why RPC does any minor
> > retransmits at all over TCP?
> > Shouldn't TCP's natural retransmit take care of that?
> 
> the socket layer guarantees delivery only to the RPC server application...
> if the application itself chooses to drop the request, an RPC retransmit
> is still required.
> 
> 	- Chuck Lever
> --

If the application "chooses to drop the request", the kernel is not
required to fix that application. The RPC cannot retransmit if
it has been shut-down or disconnected, which is about the only
way the application could "choose to drop the request". So something
doesn't smell right here.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-13 16:44   ` Richard B. Johnson
@ 2002-11-13 16:49     ` Trond Myklebust
  2002-11-13 18:38       ` Richard B. Johnson
  2002-11-13 17:42     ` Alan Cox
  1 sibling, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2002-11-13 16:49 UTC (permalink / raw)
  To: root; +Cc: Chuck Lever, Dan Kegel, Linux Kernel Mailing List

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:

     > If the application "chooses to drop the request", the kernel is
     > not required to fix that application. The RPC cannot retransmit
     > if it has been shut-down or disconnected, which is about the
     > only way the application could "choose to drop the request". So
     > something doesn't smell right here.

An NFS server is perfectly free to drop an RPC request if it doesn't
have the necessary free resources to service it (i.e. if it is out of
memory). If the client doesn't time out + retry, you lose data. Not a
good idea...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-13 16:49     ` Trond Myklebust
@ 2002-11-13 18:38       ` Richard B. Johnson
  2002-11-14 15:41         ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Richard B. Johnson @ 2002-11-13 18:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, Dan Kegel, Linux Kernel Mailing List

On 13 Nov 2002, Trond Myklebust wrote:

> >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:
> 
>      > If the application "chooses to drop the request", the kernel is
>      > not required to fix that application. The RPC cannot retransmit
>      > if it has been shut-down or disconnected, which is about the
>      > only way the application could "choose to drop the request". So
>      > something doesn't smell right here.
> 
> An NFS server is perfectly free to drop an RPC request if it doesn't
> have the necessary free resources to service it (i.e. if it is out of
> memory). If the client doesn't time out + retry, you lose data. Not a
> good idea...
> 
> Cheers,
>   Trond

The Client is the guy that just retries, as you say from a time-out.
This shouldn't affect any internal TCP/IP code. The time-out is
at the application (client) level. It sent a request, the data
was sent or promised to be sent because the write() or send() didn't
block, now it expects to get the data it asked for. It waits, nothing
happens. It times-out and sends the exact same request again.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-13 18:38       ` Richard B. Johnson
@ 2002-11-14 15:41         ` Trond Myklebust
  2002-11-14 18:36           ` Richard B. Johnson
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2002-11-14 15:41 UTC (permalink / raw)
  To: root; +Cc: Chuck Lever, Dan Kegel, Linux Kernel Mailing List

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:

     > The Client is the guy that just retries, as you say from a
     > time-out.  This shouldn't affect any internal TCP/IP code. The
     > time-out is at the application (client) level. It sent a
     > request, the data was sent or promised to be sent because the
     > write() or send() didn't block, now it expects to get the data
     > it asked for. It waits, nothing happens. It times-out and sends
     > the exact same request again.

Huh??? There's no 'application level' involved here at all, nor any
'internal TCP/IP code'.

Chuck's patch touches the way the kernel Sun RPC client code (as used
exclusively by the kernel NFS client and the kernel NLM client)
handles the generic case of message timeout + resend. Why would we
want to even consider pushing that sort of thing down into the NFS
code itself?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-14 15:41         ` Trond Myklebust
@ 2002-11-14 18:36           ` Richard B. Johnson
  2002-11-14 19:33             ` Trond Myklebust
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Richard B. Johnson @ 2002-11-14 18:36 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, Dan Kegel, Linux Kernel Mailing List

On 14 Nov 2002, Trond Myklebust wrote:

> >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:
> 
>      > The Client is the guy that just retries, as you say from a
>      > time-out.  This shouldn't affect any internal TCP/IP code. The
>      > time-out is at the application (client) level. It sent a
>      > request, the data was sent or promised to be sent because the
>      > write() or send() didn't block, now it expects to get the data
>      > it asked for. It waits, nothing happens. It times-out and sends
>      > the exact same request again.
> 
> Huh??? There's no 'application level' involved here at all, nor any
> 'internal TCP/IP code'.
> 
> Chuck's patch touches the way the kernel Sun RPC client code (as used
> exclusively by the kernel NFS client and the kernel NLM client)
> handles the generic case of message timeout + resend. Why would we
> want to even consider pushing that sort of thing down into the NFS
> code itself?
> 
> Cheers,
>   Trond
> 

Because all of the RPC stuff was, initially, user-mode code. For
performance reasons or otherwise, it was moved into the kernel.
Okay, so far? Now, when something goes wrong with that code, should
that code be fixed, or should the unrelated TCP/IP code be modified
to accommodate? I think the time-outs should be put at the correct
places and not added to generic network code.

Once the client side gets a buffer of data from the TCP/IP stack,
the TCP/IP should not care (ever) what it does with it. Putting
the timeout(s) in the TCP/IP stack, makes it care, and adds code
to accommodate.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-14 18:36           ` Richard B. Johnson
@ 2002-11-14 19:33             ` Trond Myklebust
  2002-11-14 20:26             ` Chuck Lever
  2002-11-14 21:05             ` Chuck Lever
  2 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2002-11-14 19:33 UTC (permalink / raw)
  To: root; +Cc: Trond Myklebust, Chuck Lever, Dan Kegel,
	Linux Kernel Mailing List

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:


     > Because all of the RPC stuff was, initially, user-mode
     > code. For performance reasons or otherwise, it was moved into
     > the kernel.  Okay, so far? Now, when something goes wrong with
     > that code, should that code be fixed, or should the unrelated
     > TCP/IP code be modified to accommodate? I think the time-outs
     > should be put at the correct places and not added to generic
     > network code.

No. The kernel RPC code has never been user mode code, nor has it ever
been exported to userland. It exists purely for the benefit of NFS and
friends. It is located in a subdirectory of the network code, but it
is certainly not 'generic network code'.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-14 18:36           ` Richard B. Johnson
  2002-11-14 19:33             ` Trond Myklebust
@ 2002-11-14 20:26             ` Chuck Lever
  2002-11-14 20:37               ` Richard B. Johnson
  2002-11-14 21:05             ` Chuck Lever
  2 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2002-11-14 20:26 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Linux Kernel Mailing List

On Thu, 14 Nov 2002, Richard B. Johnson wrote:

> Because all of the RPC stuff was, initially, user-mode code.

if you mean ti-rpc, that stuff comes from sun.  the linux kernel ONC/RPC
implementation is not based on the ti-rpc code because, being Transport
Independent, ti-rpc is less than optimally efficient.  also, it is
covered by a restrictive license agreement, so that code base can't be
included in the linux kernel.

> Now, when something goes wrong with that code, should
> that code be fixed, or should the unrelated TCP/IP code be modified
> to accommodate?

obviously the RPC client should be fixed....

> I think the time-outs should be put at the correct
> places and not added to generic network code.

...which is exactly what i did.

the new RPC retransmission logic is in net/sunrpc/clnt.c:call_timeout,
which is strictly a part of the RPC client's finite state machine.
underlying TCP retransmit behavior is not changed by this patch.  the
changes apply to the RPC client only, which resides above the socket
layer.

let me go over the changes again.  the RPC client sets a timeout after
sending each request.  if it doesn't receive a valid reply for a request
within the timeout interval, a "minor" timeout occurs.  after each
timeout, the RPC client doubles the timeout interval until it reaches a
maximum value.

for RPC over UDP, short timeouts and retransmission back-off make sense.
for TCP, retransmission is built into the underlying protocol, so it makes
more sense to use a constant long retransmit timeout.

a "major" timeout occurs after several "minor" timeouts.  this is an
ad-hoc mechanism for detecting that a server is actually down, rather than
just a few requests have been lost.  a "server not responding" message in
the kernel log appears when a major timeout occurs.

for UDP, there is no way a client can tell the server has gone away except
by noticing that the server is not sending any replies.  TCP sockets
require a bit more cleanup when one end dies, however, since both ends
maintain some connection state.

i've changed the RPC client's timeout behavior when it uses a TCP socket
rather than a UDP socket to connect to a server:

+  after a minor RPC retransmit timeout on a TCP socket, the RPC client
   uses the same retransmit timeout value when retransmitting the request
   rather than doubling it, as it would on a UDP socket.

+  after a major RPC retransmit timeout on a TCP socket, close the socket.
   the RPC finite state machine will notice the socket is no longer
   connected, and attempt to reestablish a connection when it retries
   the request again.

this means that after a few retransmissions, the RPC client closes the
transport socket.  if a server hasn't responded after several retransmissions,
the client now assumes that it has crashed and has lost all connection
state, so it will reestablish a fresh connection with the server.

this behavior is recommended for NFSv2 and v3 over TCP, and is required
for NFSv4 over TCP (RFC3010).

	- Chuck Lever
--
corporate:	<cel at netapp dot com>
personal:	<chucklever at bigfoot dot com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-14 20:26             ` Chuck Lever
@ 2002-11-14 20:37               ` Richard B. Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Richard B. Johnson @ 2002-11-14 20:37 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux Kernel Mailing List

On Thu, 14 Nov 2002, Chuck Lever wrote:

> On Thu, 14 Nov 2002, Richard B. Johnson wrote:
> 
> > Because all of the RPC stuff was, initially, user-mode code.
> 
> if you mean ti-rpc, that stuff comes from sun.  the linux kernel ONC/RPC
> implementation is not based on the ti-rpc code because, being Transport
> Independent, ti-rpc is less than optimally efficient.  also, it is
> covered by a restrictive license agreement, so that code base can't be
> included in the linux kernel.
> 
> > Now, when something goes wrong with that code, should
> > that code be fixed, or should the unrelated TCP/IP code be modified
> > to accommodate?
> 
> obviously the RPC client should be fixed....
> 
> > I think the time-outs should be put at the correct
> > places and not added to generic network code.
> 
> ...which is exactly what i did.
> 
> the new RPC retransmission logic is in net/sunrpc/clnt.c:call_timeout,
> which is strictly a part of the RPC client's finite state machine.
> underlying TCP retransmit behavior is not changed by this patch.  the
> changes apply to the RPC client only, which resides above the socket
> layer.
> 
> let me go over the changes again.  the RPC client sets a timeout after
> sending each request.  if it doesn't receive a valid reply for a request
> within the timeout interval, a "minor" timeout occurs.  after each
> timeout, the RPC client doubles the timeout interval until it reaches a
> maximum value.
> 
> for RPC over UDP, short timeouts and retransmission back-off make sense.
> for TCP, retransmission is built into the underlying protocol, so it makes
> more sense to use a constant long retransmit timeout.
> 
> a "major" timeout occurs after several "minor" timeouts.  this is an
> ad-hoc mechanism for detecting that a server is actually down, rather than
> just a few requests have been lost.  a "server not responding" message in
> the kernel log appears when a major timeout occurs.
> 
> for UDP, there is no way a client can tell the server has gone away except
> by noticing that the server is not sending any replies.  TCP sockets
> require a bit more cleanup when one end dies, however, since both ends
> maintain some connection state.
> 
> i've changed the RPC client's timeout behavior when it uses a TCP socket
> rather than a UDP socket to connect to a server:
> 
> +  after a minor RPC retransmit timeout on a TCP socket, the RPC client
>    uses the same retransmit timeout value when retransmitting the request
>    rather than doubling it, as it would on a UDP socket.
> 
> +  after a major RPC retransmit timeout on a TCP socket, close the socket.
>    the RPC finite state machine will notice the socket is no longer
>    connected, and attempt to reestablish a connection when it retries
>    the request again.
> 
> this means that after a few retransmissions, the RPC client closes the
> transport socket.  if a server hasn't responded after several retransmissions,
> the client now assumes that it has crashed and has lost all connection
> state, so it will reestablish a fresh connection with the server.
> 
> this behavior is recommended for NFSv2 and v3 over TCP, and is required
> for NFSv4 over TCP (RFC3010).
> 
> 	- Chuck Lever
> --
> corporate:	<cel at netapp dot com>
> personal:	<chucklever at bigfoot dot com>
> 
> 

Okay. Thanks a lot for the complete explaination. The early
information about the patch, and the patch itself that I tried
to follow, seemed to show that new retransmit timer bahavior
was applied at the TCP/IP level (actually socket level). This
is what I was bitching about.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-14 18:36           ` Richard B. Johnson
  2002-11-14 19:33             ` Trond Myklebust
  2002-11-14 20:26             ` Chuck Lever
@ 2002-11-14 21:05             ` Chuck Lever
  2 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2002-11-14 21:05 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Linux Kernel Mailing List

On Thu, 14 Nov 2002, Richard B. Johnson wrote:

> Because all of the RPC stuff was, initially, user-mode code.

if you mean ti-rpc, that stuff comes from sun.  the linux kernel ONC/RPC
implementation is not based on the ti-rpc code because, being Transport
Independent, ti-rpc is less than optimally efficient.  also, it is
covered by a restrictive license agreement, so that code base can't be
included in the linux kernel.

> Now, when something goes wrong with that code, should
> that code be fixed, or should the unrelated TCP/IP code be modified
> to accommodate?

obviously the RPC client should be fixed....

> I think the time-outs should be put at the correct
> places and not added to generic network code.

...which is exactly what i did.

the new RPC retransmission logic is in net/sunrpc/clnt.c:call_timeout,
which is strictly a part of the RPC client's finite state machine.
underlying TCP retransmit behavior is not changed by this patch.  the
changes apply to the RPC client only, which resides above the socket
layer.

let me go over the changes again.  the RPC client sets a timeout after
sending each request.  if it doesn't receive a valid reply for a request
within the timeout interval, a "minor" timeout occurs.  after each
timeout, the RPC client doubles the timeout interval until it reaches a
maximum value.

for RPC over UDP, short timeouts and retransmission back-off make sense.
for TCP, retransmission is built into the underlying protocol, so it makes
more sense to use a constant long retransmit timeout.

a "major" timeout occurs after several "minor" timeouts.  this is an
ad-hoc mechanism for detecting that a server is actually down, rather than
just a few requests have been lost.  a "server not responding" message in
the kernel log appears when a major timeout occurs.

for UDP, there is no way a client can tell the server has gone away except
by noticing that the server is not sending any replies.  TCP sockets
require a bit more cleanup when one end dies, however, since both ends
maintain some connection state.

i've changed the RPC client's timeout behavior when it uses a TCP socket
rather than a UDP socket to connect to a server:

+  after a minor RPC retransmit timeout on a TCP socket, the RPC client
   uses the same retransmit timeout value when retransmitting the request
   rather than doubling it, as it would on a UDP socket.

+  after a major RPC retransmit timeout on a TCP socket, close the socket.
   the RPC finite state machine will notice the socket is no longer
   connected, and attempt to reestablish a connection when it retries
   the request again.

this means that after a few retransmissions, the RPC client closes the
transport socket.  if a server hasn't responded after several retransmissions,
the client now assumes that it has crashed and has lost all connection
state, so it will reestablish a fresh connection with the server.

this behavior is recommended for NFSv2 and v3 over TCP, and is required
for NFSv4 over TCP (RFC3010).

	- Chuck Lever
--
corporate:	<cel at netapp dot com>
personal:	<chucklever at bigfoot dot com>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-13 16:44   ` Richard B. Johnson
  2002-11-13 16:49     ` Trond Myklebust
@ 2002-11-13 17:42     ` Alan Cox
  2002-11-13 18:33       ` Richard B. Johnson
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Cox @ 2002-11-13 17:42 UTC (permalink / raw)
  To: root; +Cc: Chuck Lever, Dan Kegel, Linux Kernel Mailing List

On Wed, 2002-11-13 at 16:44, Richard B. Johnson wrote:
> If the application "chooses to drop the request", the kernel is not
> required to fix that application. The RPC cannot retransmit if
> it has been shut-down or disconnected, which is about the only
> way the application could "choose to drop the request". So something
> doesn't smell right here.

Check your socks...

As far as RPC goes the RPC server can choose to drop a request whenever
it pleases by simply throwing it away (eg reading it from the socket and
binning it) depending on its workload. There are actually reasons for
that in some situations (eg if the top requests are all for a volume
that is down its better to throw them away so you can get requests for a
volume that is functional)

Alan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: [PATCH] new timeout behavior for RPC requests on TCP sockets
  2002-11-13 17:42     ` Alan Cox
@ 2002-11-13 18:33       ` Richard B. Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Richard B. Johnson @ 2002-11-13 18:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: Chuck Lever, Dan Kegel, Linux Kernel Mailing List

On 13 Nov 2002, Alan Cox wrote:

> On Wed, 2002-11-13 at 16:44, Richard B. Johnson wrote:
> > If the application "chooses to drop the request", the kernel is not
> > required to fix that application. The RPC cannot retransmit if
> > it has been shut-down or disconnected, which is about the only
> > way the application could "choose to drop the request". So something
> > doesn't smell right here.
> 
> Check your socks...
> 
> As far as RPC goes the RPC server can choose to drop a request whenever
> it pleases by simply throwing it away (eg reading it from the socket and
> binning it) depending on its workload. There are actually reasons for
> that in some situations (eg if the top requests are all for a volume
> that is down its better to throw them away so you can get requests for a
> volume that is functional)
> 
> Alan

Yes! But, the Client it is perfectly free to request it again and
should (must) to keep any mounted volumes intact. This doesn't
affect the internal TCP/IP stack (or it shouldn't). Since the whole
NFS thing is "stateless", the client just issues another request.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] new timeout behavior for RPC requests on TCP sockets
@ 2002-11-26 19:57 Chuck Lever
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2002-11-26 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Linux NFS List

Description:
  make RPC timeout behavior over TCP sockets behave more like reference
  client implementations.  reference behavior is to transmit the same
  request three times at 60 second intervals; if there is no response,
  close and reestablish the socket connection.  note that this patch
  provides a way to support NFSv4 (RFC3010) reliable stream retrans-
  mission behavior as well.

  we modify the Linux RPC client as follows:

+  after a minor RPC retransmit timeout, the RPC client uses the same
   retransmit timeout value when retransmitting the request rather than
   doubling the value, as it would on a UDP socket.

+  after a major RPC retransmit timeout, close the socket.  the RPC
   finite state machine will notice the socket is no longer connected,
   and attempt to reestablish a fresh TCP connection when it retries
   the request again.

  today, mount uses a 6 second timeout with 5 retries for NFS over
  TCP by default; proper default behavior is 2 retries each with 60
  second timeouts.  a separate patch for mount is pending, but in the
  meantime, sysadmins can use "timeo=600,retrans=2" to get standard
  behavior for NFSv2 and NFSv3 over TCP.  for NFSv4, which does not
  use RPC retransmits over reliable network connections (RFC3010), use
  "timeo=600,retrans=0".

Apply Against:
  2.5.49

Test status:
  Pull ethernet cable and watch RPC debug output.  Passes Connectathon
  '02 and other stress tests.  Tested with NFSv2 and NFSv3.

diff -Naur 04-connect3/include/linux/sunrpc/xprt.h 05-timeout/include/linux/sunrpc/xprt.h
--- 04-connect3/include/linux/sunrpc/xprt.h	Mon Nov 25 13:24:29 2002
+++ 05-timeout/include/linux/sunrpc/xprt.h	Mon Nov 25 13:26:36 2002
@@ -182,9 +182,10 @@
 void			xprt_reserve(struct rpc_task *);
 void			xprt_transmit(struct rpc_task *);
 void			xprt_receive(struct rpc_task *);
-int			xprt_adjust_timeout(struct rpc_timeout *);
+void			xprt_adjust_timeout(struct rpc_timeout *);
 void			xprt_release(struct rpc_task *);
 void			xprt_connect(struct rpc_task *);
+void			xprt_disconnect(struct rpc_xprt *);
 int			xprt_clear_backlog(struct rpc_xprt *);
 void			xprt_sock_setbufsize(struct rpc_xprt *);
 
diff -Naur 04-connect3/net/sunrpc/clnt.c 05-timeout/net/sunrpc/clnt.c
--- 04-connect3/net/sunrpc/clnt.c	Mon Nov 25 13:24:29 2002
+++ 05-timeout/net/sunrpc/clnt.c	Mon Nov 25 13:26:36 2002
@@ -656,6 +656,9 @@
 		if (clnt->cl_autobind)
 			clnt->cl_port = 0;
 		task->tk_action = call_bind;
+		/* A disconnect can happen after only part of an RPC was
+		 * sent on a TCP socket.  send all of this request again */
+		req->rq_bytes_sent = 0;
 		break;
 	case -EAGAIN:
 		task->tk_action = call_transmit;
@@ -677,20 +680,34 @@
  * 6a.	Handle RPC timeout
  * 	We do not release the request slot, so we keep using the
  *	same XID for all retransmits.
+ *	For stream transports, shut down the transport socket when
+ *	a request sees a major time out.  When any request on this
+ *	connection is retried, the FSM notices the socket has been
+ *	shut down, and attempts to reconnect.
  */
 static void
 call_timeout(struct rpc_task *task)
 {
 	struct rpc_clnt	*clnt = task->tk_client;
-	struct rpc_timeout *to = &task->tk_rqstp->rq_timeout;
+	struct rpc_xprt *xprt = clnt->cl_xprt;
+	struct rpc_rqst *req = task->tk_rqstp;
+	struct rpc_timeout *to = &req->rq_timeout;
 
-	if (xprt_adjust_timeout(to)) {
-		dprintk("RPC: %4d call_timeout (minor)\n", task->tk_pid);
+	if (!xprt->stream)
+		xprt_adjust_timeout(to);
+
+	if (to->to_retries--) {
+		dprintk("RPC: %4d call_timeout (minor, retries=%d)\n",
+				task->tk_pid, to->to_retries);
 		goto retry;
 	}
-	to->to_retries = clnt->cl_timeout.to_retries;
+	to->to_retries = xprt->timeout.to_retries;
 
 	dprintk("RPC: %4d call_timeout (major)\n", task->tk_pid);
+
+	if (xprt->stream)
+		xprt_disconnect(xprt);
+
 	if (clnt->cl_softrtry) {
 		if (clnt->cl_chatty && !task->tk_exit)
 			printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
@@ -699,15 +716,18 @@
 		return;
 	}
 
-	if (clnt->cl_chatty && !(task->tk_flags & RPC_CALL_MAJORSEEN) && rpc_ntimeo(&clnt->cl_rtt) > 7) {
-		task->tk_flags |= RPC_CALL_MAJORSEEN;
-		printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
-			clnt->cl_protname, clnt->cl_server);
+	if (clnt->cl_chatty && !(task->tk_flags & RPC_CALL_MAJORSEEN)) {
+		if (xprt->stream || (rpc_ntimeo(&clnt->cl_rtt) > 7)) {
+			task->tk_flags |= RPC_CALL_MAJORSEEN;
+			printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
+				clnt->cl_protname, clnt->cl_server);
+		}
 	}
 	if (clnt->cl_autobind)
 		clnt->cl_port = 0;
 
 retry:
+	req->rq_bytes_sent = 0;		/* send all of this request again */
 	clnt->cl_stats->rpcretrans++;
 	task->tk_action = call_bind;
 	task->tk_status = 0;
diff -Naur 04-connect3/net/sunrpc/xprt.c 05-timeout/net/sunrpc/xprt.c
--- 04-connect3/net/sunrpc/xprt.c	Mon Nov 25 13:24:29 2002
+++ 05-timeout/net/sunrpc/xprt.c	Mon Nov 25 13:26:36 2002
@@ -85,7 +85,6 @@
 static void	xprt_request_init(struct rpc_task *, struct rpc_xprt *);
 static void	do_xprt_transmit(struct rpc_task *);
 static inline void	do_xprt_reserve(struct rpc_task *);
-static void	xprt_disconnect(struct rpc_xprt *);
 static void	xprt_conn_status(struct rpc_task *task);
 static struct rpc_xprt * xprt_setup(int proto, struct sockaddr_in *ap,
 						struct rpc_timeout *to);
@@ -336,7 +335,7 @@
 /*
  * Adjust timeout values etc for next retransmit
  */
-int
+void
 xprt_adjust_timeout(struct rpc_timeout *to)
 {
 	if (to->to_retries > 0) {
@@ -362,7 +361,7 @@
 	}
 	pprintk("RPC: %lu %s\n", jiffies,
 			to->to_retries? "retrans" : "timeout");
-	return to->to_retries-- > 0;
+	return;
 }
 
 /*
@@ -394,7 +393,7 @@
 /*
  * Mark a transport as disconnected
  */
-static void
+void
 xprt_disconnect(struct rpc_xprt *xprt)
 {
 	dprintk("RPC:      disconnected transport %p\n", xprt);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] new timeout behavior for RPC requests on TCP sockets
@ 2002-11-12 23:15 Chuck Lever
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2002-11-12 23:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux NFS List, Linux Kernel Mailing List

make RPC timeout behavior over TCP sockets behave more like reference
client implementations.  reference behavior is to transmit the same
request three times at 60 second intervals; if there is no response, close
and reestablish the socket connection.  we modify the Linux RPC client as
follows:

+  after a minor retransmit timeout, use the same timeout value when
   retrying on a TCP socket rather than doubling the value
+  after a major retransmit timeout, close the socket and attempt
   to reestablish a fresh TCP connection

note that today mount uses a 6 second timeout with 5 retries for NFS over
TCP by default; proper default behavior is 2 retries each with 60 second
timeouts.  a separate patch for mount is pending.

against 2.5.47.


diff -ruN 10-connect3/include/linux/sunrpc/xprt.h 11-timeout/include/linux/sunrpc/xprt.h
--- 10-connect3/include/linux/sunrpc/xprt.h	Tue Nov 12 16:18:57 2002
+++ 11-timeout/include/linux/sunrpc/xprt.h	Tue Nov 12 17:06:16 2002
@@ -182,9 +182,10 @@
 void			xprt_reserve(struct rpc_task *);
 void			xprt_transmit(struct rpc_task *);
 void			xprt_receive(struct rpc_task *);
-int			xprt_adjust_timeout(struct rpc_timeout *);
+void			xprt_adjust_timeout(struct rpc_timeout *);
 void			xprt_release(struct rpc_task *);
 void			xprt_connect(struct rpc_task *);
+void			xprt_disconnect(struct rpc_xprt *);
 int			xprt_clear_backlog(struct rpc_xprt *);
 void			xprt_sock_setbufsize(struct rpc_xprt *);
 
diff -ruN 10-connect3/net/sunrpc/clnt.c 11-timeout/net/sunrpc/clnt.c
--- 10-connect3/net/sunrpc/clnt.c	Tue Nov 12 16:19:25 2002
+++ 11-timeout/net/sunrpc/clnt.c	Tue Nov 12 17:13:24 2002
@@ -667,6 +667,9 @@
 		if (clnt->cl_autobind)
 			clnt->cl_port = 0;
 		task->tk_action = call_bind;
+		/* A disconnect can happen after only part of an RPC was
+		 * sent on a TCP socket.  send all of this request again */
+		req->rq_bytes_sent = 0;
 		break;
 	case -EAGAIN:
 		task->tk_action = call_transmit;
@@ -688,20 +691,34 @@
  * 6a.	Handle RPC timeout
  * 	We do not release the request slot, so we keep using the
  *	same XID for all retransmits.
+ *	For stream transports, shut down the transport socket when
+ *	a request sees a major time out.  When any request on this
+ *	connection is retried, the FSM notices the socket has been
+ *	shut down, and attempts to reconnect.
  */
 static void
 call_timeout(struct rpc_task *task)
 {
 	struct rpc_clnt	*clnt = task->tk_client;
-	struct rpc_timeout *to = &task->tk_rqstp->rq_timeout;
+	struct rpc_xprt *xprt = clnt->cl_xprt;
+	struct rpc_rqst *req = task->tk_rqstp;
+	struct rpc_timeout *to = &req->rq_timeout;
 
-	if (xprt_adjust_timeout(to)) {
-		dprintk("RPC: %4d call_timeout (minor)\n", task->tk_pid);
+	if (!xprt->stream)
+		xprt_adjust_timeout(to);
+
+	if (to->to_retries--) {
+		dprintk("RPC: %4d call_timeout (minor, retries=%d)\n",
+				task->tk_pid, to->to_retries);
 		goto retry;
 	}
-	to->to_retries = clnt->cl_timeout.to_retries;
+	to->to_retries = xprt->timeout.to_retries;
 
 	dprintk("RPC: %4d call_timeout (major)\n", task->tk_pid);
+
+	if (xprt->stream)
+		xprt_disconnect(xprt);
+
 	if (clnt->cl_softrtry) {
 		if (clnt->cl_chatty && !task->tk_exit)
 			printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
@@ -710,15 +727,18 @@
 		return;
 	}
 
-	if (clnt->cl_chatty && !(task->tk_flags & RPC_CALL_MAJORSEEN) && rpc_ntimeo(&clnt->cl_rtt) > 7) {
-		task->tk_flags |= RPC_CALL_MAJORSEEN;
-		printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
-			clnt->cl_protname, clnt->cl_server);
+	if (clnt->cl_chatty && !(task->tk_flags & RPC_CALL_MAJORSEEN)) {
+		if (xprt->stream || (rpc_ntimeo(&clnt->cl_rtt) > 7)) {
+			task->tk_flags |= RPC_CALL_MAJORSEEN;
+			printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
+				clnt->cl_protname, clnt->cl_server);
+		}
 	}
 	if (clnt->cl_autobind)
 		clnt->cl_port = 0;
 
 retry:
+	req->rq_bytes_sent = 0;		/* send all of this request again */
 	clnt->cl_stats->rpcretrans++;
 	task->tk_action = call_bind;
 	task->tk_status = 0;
diff -ruN 10-connect3/net/sunrpc/xprt.c 11-timeout/net/sunrpc/xprt.c
--- 10-connect3/net/sunrpc/xprt.c	Tue Nov 12 16:31:08 2002
+++ 11-timeout/net/sunrpc/xprt.c	Tue Nov 12 17:06:16 2002
@@ -85,7 +85,6 @@
 static void	xprt_request_init(struct rpc_task *, struct rpc_xprt *);
 static void	do_xprt_transmit(struct rpc_task *);
 static inline void	do_xprt_reserve(struct rpc_task *);
-static void	xprt_disconnect(struct rpc_xprt *);
 static void	xprt_conn_status(struct rpc_task *task);
 static struct rpc_xprt * xprt_setup(int proto, struct sockaddr_in *ap,
 						struct rpc_timeout *to);
@@ -336,7 +335,7 @@
 /*
  * Adjust timeout values etc for next retransmit
  */
-int
+void
 xprt_adjust_timeout(struct rpc_timeout *to)
 {
 	if (to->to_retries > 0) {
@@ -362,7 +361,7 @@
 	}
 	pprintk("RPC: %lu %s\n", jiffies,
 			to->to_retries? "retrans" : "timeout");
-	return to->to_retries-- > 0;
+	return;
 }
 
 /*
@@ -394,7 +393,7 @@
 /*
  * Mark a transport as disconnected
  */
-static void
+void
 xprt_disconnect(struct rpc_xprt *xprt)
 {
 	dprintk("RPC:      disconnected transport %p\n", xprt);


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2002-11-26 19:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-12 23:48 [PATCH] new timeout behavior for RPC requests on TCP sockets Dan Kegel
2002-11-13 15:58 ` Chuck Lever
2002-11-13 16:44   ` Richard B. Johnson
2002-11-13 16:49     ` Trond Myklebust
2002-11-13 18:38       ` Richard B. Johnson
2002-11-14 15:41         ` Trond Myklebust
2002-11-14 18:36           ` Richard B. Johnson
2002-11-14 19:33             ` Trond Myklebust
2002-11-14 20:26             ` Chuck Lever
2002-11-14 20:37               ` Richard B. Johnson
2002-11-14 21:05             ` Chuck Lever
2002-11-13 17:42     ` Alan Cox
2002-11-13 18:33       ` Richard B. Johnson
  -- strict thread matches above, loose matches on Subject: below --
2002-11-26 19:57 Chuck Lever
2002-11-12 23:15 Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox