status of nfs and tcp with 2.4

All of lore.kernel.org
 help / color / mirror / Atom feed

* status of nfs and tcp with 2.4
@ 2001-09-27 14:53 James D Strandboge
  2001-09-27 15:32 ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: James D Strandboge @ 2001-09-27 14:53 UTC (permalink / raw)
  To: LINUX-KERNEL

What is the status of tcp and nfs with the 2.4 kernel?  The sourceforge
site (regarding this) has not changed for a while and the NFS FAQ at 
sourceforge simply states:
nfsv3 over tcp does not work - the code for 2.4.x is as yet to be merged

What progress is being made toward this end?

Thanks,
Jamie Strandboge

-- 
Email:                  jstrand1@rochester.rr.com
GPG/PGP Public Key ID:  26384A3A
GPG/PGP Fingerprint:    D9FF DF4A 2D46 A353 A289  E8F5 AA75 DCBE 2638 4A3A

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: status of nfs and tcp with 2.4
  2001-09-27 14:53 status of nfs and tcp with 2.4 James D Strandboge
@ 2001-09-27 15:32 ` Trond Myklebust
  2001-09-27 17:10   ` James D Strandboge
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2001-09-27 15:32 UTC (permalink / raw)
  To: James D Strandboge; +Cc: LINUX-KERNEL

>>>>> " " == James D Strandboge <jstrand1@rochester.rr.com> writes:

     > What is the status of tcp and nfs with the 2.4 kernel?  The
     > sourceforge site (regarding this) has not changed for a while
     > and the NFS FAQ at sourceforge simply states: nfsv3 over tcp
     > does not work - the code for 2.4.x is as yet to be merged

     > What progress is being made toward this end?

None: AFAIK nobody has yet written any code that works for the server.

The client works though...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: status of nfs and tcp with 2.4
  2001-09-27 15:32 ` Trond Myklebust
@ 2001-09-27 17:10   ` James D Strandboge
  2001-09-27 17:27     ` Bill Rugolsky Jr.
  2001-09-28  8:46     ` Trond Myklebust
  0 siblings, 2 replies; 7+ messages in thread
From: James D Strandboge @ 2001-09-27 17:10 UTC (permalink / raw)
  To: LINUX-KERNEL

On Thu, Sep 27, 2001 at 05:32:09PM +0200 or thereabouts, Trond Myklebust wrote:
> None: AFAIK nobody has yet written any code that works for the server.

In your opinion, how involved would it be to write the tcp code since
the udp is already written?  I haven't actually looked into it much,
and thought you might have some ideas, or perhaps pointers.

Jamie
-- 
GPG/PGP Info
Email:        jstrand1@rochester.rr.com
ID:           26384A3A
Fingerprint:  D9FF DF4A 2D46 A353 A289  E8F5 AA75 DCBE 2638 4A3A
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: status of nfs and tcp with 2.4
  2001-09-27 17:10   ` James D Strandboge
@ 2001-09-27 17:27     ` Bill Rugolsky Jr.
  2001-09-28  8:46     ` Trond Myklebust
  1 sibling, 0 replies; 7+ messages in thread
From: Bill Rugolsky Jr. @ 2001-09-27 17:27 UTC (permalink / raw)
  To: James D Strandboge; +Cc: LINUX-KERNEL

On Thu, Sep 27, 2001 at 01:10:30PM -0400, James D Strandboge wrote:
> On Thu, Sep 27, 2001 at 05:32:09PM +0200 or thereabouts, Trond Myklebust wrote:
> > None: AFAIK nobody has yet written any code that works for the server.
> 
> In your opinion, how involved would it be to write the tcp code since
> the udp is already written?  I haven't actually looked into it much,
> and thought you might have some ideas, or perhaps pointers.

Neil Brown answered a query from Martin Pool about this on the NFS list
back in July.  You should probably contact Martin.

Regards,

   Bill Rugolsky

> From: Neil Brown <neilb@cse.unsw.edu.au>
> To: Martin Pool <mbp@valinux.com>
> Message-ID: <15198.47569.868029.592501@notabene.cse.unsw.edu.au>
> Cc: nfs@lists.sourceforge.net, tpot@valinux.com
> Subject: Re: [NFS] NFSv3/tcp -- where to begin?
> In-Reply-To: message from Martin Pool on Wednesday July 25
> References: <20010725205307.B1435@wistful.humbug.org.au>
> List-Archive: <http://lists.sourceforge.net/archives//nfs/>
> Date: Wed, 25 Jul 2001 22:21:37 +1000 (EST)

On Wednesday July 25, mbp@valinux.com wrote:
> Can anybody give me some idea of what in particular is broken with NFS
> over TCP in knfsd?  I'd like to try and fix it.
> 
> I can start by just removing the #if 0 and seeing what breaks, but if
> some kind soul would point me in the right direction that would be
> great...

I think that is it all corner cases now.  I have run the SPEC SFS
benchmark against knfsd using tcp and got it to complete, so it sort
of works.

Issues that I can think of include:

- Guard against denial of service - impose some limit on the number of
  incoming connections, and start randomly dropping connections when
  this limit is exceeded.
- cope with fragmented rpc packets - or prove that they don't exist.
  RPC over TCP consists of a number of frames, each with a 4 byte
  header.  The bottom 31 bits are the frame size.  The top bit
  indicates whether this is a terminal fragment.
  A sequence of non-terminal fragements followed by a terminal
  fragment make one RPC packet.  The code current rejects any
  non-terminal fragment and (I think) closes the connections.
  See comment in  net/sunrpc/svcsock.c:svc_tcp_recvfrom
  Many clients never send non-terminal fragments, but the spec says
  they are allowed so....
- Fix svc_tcp_sendto.
  If there is insufficient room in the socket buffers, the write will
  block (I think) and a dead client could tie up a tcp thread for a
  long time.  Alternately, the write might not block (I cannot
  remember) and some data will simple never be sent, which will
  confuse the client.
  There have been various suggestions for fixing this, like having a
  single thread given the responsibility of blocking, and
  disassociating the svc_rqst structure from the threads (currently
  there is one request structure per thread).
  Ultimately, you need to decide when you are going to say "I cannot
  deliver this reply", and then whether you will just drop the packet,
  or close the connection.
  You need to decide the maximum amount of buffers that you will
  allocate, and under what circumstances you will wait for space to be
  available in the queue.
  Maybe if there is insufficient spare to write the whole replay then:
   if there a 10% idle threads, block, 
   else close the connection.

  Also, you might want to throttle incoming requests when memory gets
  tight.  E.g. if any thread is blocking on writing to a tcp
  connection, don't accept any more requests on that connection.

- guard against ridiculously large incoming packets.  If a header
  arrives saying there are 10 million bytes to come, the code will
  currently wait for them.  If should reject any packets which claims
  to be larged than RPCSVC_MAXPAYLOAD.
  There is also a "FIXME" that points out that data is left on the
  incoming queue until a full frame has arrived.  If this is bigger
  than the TCP window size, it will never arrive.
  Now I think that RPCSVC_MAXPAYLOAD is smaller than the default
  window size, so the above fix should resolve this, but it should be checked.

- address every "FIXME" in net/sunrpc/svcsock.c

That should be enough to get you started :-)
It pretty much all fits the category of avoiding denial of service,
either deliberate or accidental.  Ask yourself "How can an obnoxious client
behave in a way that we don't expect and hence confuse or disable the
server."


NeilBrown



_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/nfs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: status of nfs and tcp with 2.4
  2001-09-27 17:10   ` James D Strandboge
  2001-09-27 17:27     ` Bill Rugolsky Jr.
@ 2001-09-28  8:46     ` Trond Myklebust
  2001-10-03 12:33       ` James D Strandboge
  1 sibling, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2001-09-28  8:46 UTC (permalink / raw)
  To: James D Strandboge; +Cc: LINUX-KERNEL

>>>>> " " == James D Strandboge <jstrand1@rochester.rr.com> writes:

     > On Thu, Sep 27, 2001 at 05:32:09PM +0200 or thereabouts, Trond
     > Myklebust wrote:
    >> None: AFAIK nobody has yet written any code that works for the
    >> server.

     > In your opinion, how involved would it be to write the tcp code
     > since the udp is already written?  I haven't actually looked
     > into it much, and thought you might have some ideas, or perhaps
     > pointers.

The biggest problem is to prevent the TCP server hogging all the
threads when a client gets congested.

With the UDP code, we use non-blocking I/O and simply drop all replies
that don't get through. For TCP dropping replies is not acceptable as
the client will only resend requests every ~60seconds. Currently, the
code therefore uses blocking I/O something which means that if the
socket blocks, you run out of free nfsd threads...

There are 2 possible strategies:

  1 Allocate 1 thread per TCP connection
  2 Use non-blocking I/O, but allow TCP connections to defer sending
    the reply until the socket is available (and allow the thread to
    service other requests while the socket is busy).

I started work on (2) last autumn, but I haven't had time to get much
done since then. It's on my list of priorities for 2.5.x though, so if
nobody else wants to get their hands dirty I will get back to it...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: status of nfs and tcp with 2.4
  2001-09-28  8:46     ` Trond Myklebust
@ 2001-10-03 12:33       ` James D Strandboge
  2001-10-03 14:31         ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: James D Strandboge @ 2001-10-03 12:33 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: LINUX-KERNEL

Forgive me, Trond, for sending a second reply to this, I am trying to
dig in and get my hands dirty, but want to make sure I understand the
problem.

> The biggest problem is to prevent the TCP server hogging all the
> threads when a client gets congested.
>
> With the UDP code, we use non-blocking I/O and simply drop all replies
> that don't get through. For TCP dropping replies is not acceptable as
> the client will only resend requests every ~60seconds. Currently, the
> code therefore uses blocking I/O something which means that if the
> socket blocks, you run out of free nfsd threads...

By 'when a client gets congested' my understanding is you mean 'when
a client is sending a lot to the server, and the server can't respond
quickly enough.'  Therefore, dropping udp replies is ok, since the
client will just send it again, however, with tcp, the client will only
resend every 60 seconds and that is too slow, and it blocks the socket
in the meantime.  Is my understanding correct?

> There are 2 possible strategies:
> 
>   1 Allocate 1 thread per TCP connection

This seems to be the easier of the two to implement, however you opted
against this because we are putting an eventual limit on the number of 
clients we can serve based on NFSD_MAXSERVS.  Is this correct?

>   2 Use non-blocking I/O, but allow TCP connections to defer sending
>     the reply until the socket is available (and allow the thread to
>     service other requests while the socket is busy).
>
> I started work on (2) last autumn, <snip>

Are there patches for this that I could look at?

Jamie Strandboge

-- 
GPG/PGP Info
Email:        jstrand1@rochester.rr.com
ID:           26384A3A
Fingerprint:  D9FF DF4A 2D46 A353 A289  E8F5 AA75 DCBE 2638 4A3A
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: status of nfs and tcp with 2.4
  2001-10-03 12:33       ` James D Strandboge
@ 2001-10-03 14:31         ` Trond Myklebust
  0 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2001-10-03 14:31 UTC (permalink / raw)
  To: James D Strandboge; +Cc: LINUX-KERNEL

>>>>> " " == James D Strandboge <jstrand1@rochester.rr.com> writes:

     > By 'when a client gets congested' my understanding is you mean
     > 'when a client is sending a lot to the server, and the server
     > can't respond quickly enough.'  Therefore, dropping udp replies

There are several scenarios. The one that worries me most on TCP
connections is when the TCP socket on the server gets swamped for some
reason, and the call to sendmsg() sleeps. This means that if one
client has fired off a load of requests, and then doesn't listen for
the reply, we can end up sleeping for a long time (and being
unavailable to other clients).

OTOH under UDP, we use nonblocking I/O, so the sendmsg() returns, and
the server can just drop the request (as the UDP allows for quick
resends). The thread in this scenario therefore never sleeps if a
client is unavailable. It can only sleep on (relatively fast) disk
I/O.

     > is ok, since the client will just send it again, however, with
     > tcp, the client will only resend every 60 seconds and that is
     > too slow, and it blocks the socket in the meantime.  Is my
     > understanding correct?

That is correct. TCP is designed to be a reliable protocol, so clients
are allowed to assume that the server will reply to a request once it
has been sent.

    >> There are 2 possible strategies:
    >>
    >> 1 Allocate 1 thread per TCP connection

     > This seems to be the easier of the two to implement, however
     > you opted against this because we are putting an eventual limit
     > on the number of clients we can serve based on NFSD_MAXSERVS.
     > Is this correct?

Well... Thread limits can be changed. My main objection is that it is
ugly. Why allocate a thread when what you want to do is to be able to
cope with sleeping? We have non-blocking I/O, and the tcp
'write_space()' socket routine (see the client use in
net/sunrpc/xprt.c) that was designed to enable a thread to get called
back once a socket is free.

    >> 2 Use non-blocking I/O, but allow TCP connections to defer
    >> sending the reply until the socket is available (and allow the
    >> thread to service other requests while the socket is busy).
    >>
    >> I started work on (2) last autumn, <snip>

     > Are there patches for this that I could look at?

  http://www.fys.uio.no/~trondmy/src/pre_alpha/linux-2.4.0-test6-rpctcp.dif

It's a patch against linux-2.4.0-test6 and is basically at the 'toy'
stage. Definitely nowhere near ready for release. IIRC though it did
actually run fairly reliably.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-10-03 14:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-27 14:53 status of nfs and tcp with 2.4 James D Strandboge
2001-09-27 15:32 ` Trond Myklebust
2001-09-27 17:10   ` James D Strandboge
2001-09-27 17:27     ` Bill Rugolsky Jr.
2001-09-28  8:46     ` Trond Myklebust
2001-10-03 12:33       ` James D Strandboge
2001-10-03 14:31         ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.