nfs udp 1000/100baseT issue

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* nfs udp 1000/100baseT issue
@ 2006-03-15 22:24 Bret Towe
  2006-03-16 20:41 ` Jan Engelhardt
  2006-03-17 11:18 ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Bret Towe @ 2006-03-15 22:24 UTC (permalink / raw)
  To: linux-kernel

a while ago i noticed a issue when one has a nfs server that has
gigabit connection
to a network and a client that connects to that network instead via 100baseT
that udp connection from client to server fails the client gets a
server not responding
message when trying to access a file, interesting bit is you can get a directory
listing without issue
work around i found for this is adding proto=tcp to the client side
and all works
without error

ive seen this on kernels as far back as 2.6.13 on my own machines
(was around that time when i accutally got gigabit at home)
and recently noticed on some thin clients i maintain that 2.4 kernels
on the client side are also affected so perhaps its server side issue?
as all servers ive seen this on are 2.6 i havent used 2.4 kernels in ages
on my own machines so i havent looked into if 2.4 has that issue server side
or not

error message i see client side are as follows:
nfs: server vox.net not responding, still trying
nfs: server vox.net not responding, still trying
nfs: server vox.net not responding, still trying

server side shows no errors at all


i was able to cat a couple files and narrow it down to it doesnt like files
over 28504 bytes
kernels at the moment here are client and server 2.6.15.4 but like i
said eariler
version seems to not matter much

any further info needed ask
testing i can also do but it might take me a while before i can get around to it
took me a couple months just to get around to doing this :\

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-15 22:24 nfs udp 1000/100baseT issue Bret Towe
@ 2006-03-16 20:41 ` Jan Engelhardt
  2006-03-17  1:33   ` Bret Towe
  2006-03-17 11:18 ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Engelhardt @ 2006-03-16 20:41 UTC (permalink / raw)
  To: Bret Towe; +Cc: linux-kernel

>
>a while ago i noticed a issue when one has a nfs server that has
>gigabit connection
>to a network and a client that connects to that network instead via 100baseT
>that udp connection from client to server fails the client gets a
>server not responding
>message when trying to access a file, interesting bit is you can get a directory
>listing without issue
>work around i found for this is adding proto=tcp to the client side
>and all works
>without error

UDP has its implications, like silently dropping packets when the link 
is full, by design. Try tcpdump on both systems and compare what packets 
are sent and which do arrive. The error message is then probably because 
the client is confused of not receiving some packets.

>error message i see client side are as follows:
>nfs: server vox.net not responding, still trying
>nfs: server vox.net not responding, still trying
>nfs: server vox.net not responding, still trying


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-16 20:41 ` Jan Engelhardt
@ 2006-03-17  1:33   ` Bret Towe
  2006-03-17  2:20     ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Bret Towe @ 2006-03-17  1:33 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel

On 3/16/06, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote:
> >
> >a while ago i noticed a issue when one has a nfs server that has
> >gigabit connection
> >to a network and a client that connects to that network instead via 100baseT
> >that udp connection from client to server fails the client gets a
> >server not responding
> >message when trying to access a file, interesting bit is you can get a directory
> >listing without issue
> >work around i found for this is adding proto=tcp to the client side
> >and all works
> >without error
>
> UDP has its implications, like silently dropping packets when the link
> is full, by design. Try tcpdump on both systems and compare what packets
> are sent and which do arrive. The error message is then probably because
> the client is confused of not receiving some packets.

after compairing a working and not working client i found that
packets containing offset 19240, 20720, 22200 are missing
and the 100baseT client had an extra offset of 32560
on the working client it ends at 31080

the missing ones are mostly constantly missing 22200 appears every so often
on retransmission and 23680 also disappears every so often

i hope that isnt too confusing i dont use tcpdump type stuff much
(well i did give up on tcpdump and had to use ethereal...)

> >error message i see client side are as follows:
> >nfs: server vox.net not responding, still trying
> >nfs: server vox.net not responding, still trying
> >nfs: server vox.net not responding, still trying
>
>
> Jan Engelhardt
> --
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-17  1:33   ` Bret Towe
@ 2006-03-17  2:20     ` Neil Brown
  2006-03-17  3:11       ` Bret Towe
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2006-03-17  2:20 UTC (permalink / raw)
  To: Bret Towe; +Cc: Jan Engelhardt, linux-kernel

On Thursday March 16, magnade@gmail.com wrote:
> On 3/16/06, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote:
> > >
> > >a while ago i noticed a issue when one has a nfs server that has
> > >gigabit connection
> > >to a network and a client that connects to that network instead via 100baseT
> > >that udp connection from client to server fails the client gets a
> > >server not responding
> > >message when trying to access a file, interesting bit is you can get a directory
> > >listing without issue
> > >work around i found for this is adding proto=tcp to the client side
> > >and all works
> > >without error
> >
> > UDP has its implications, like silently dropping packets when the link
> > is full, by design. Try tcpdump on both systems and compare what packets
> > are sent and which do arrive. The error message is then probably because
> > the client is confused of not receiving some packets.
> 
> after compairing a working and not working client i found that
> packets containing offset 19240, 20720, 22200 are missing
> and the 100baseT client had an extra offset of 32560
> on the working client it ends at 31080
> 
> the missing ones are mostly constantly missing 22200 appears every so often
> on retransmission and 23680 also disappears every so often
> 
> i hope that isnt too confusing i dont use tcpdump type stuff much
> (well i did give up on tcpdump and had to use ethereal...)

This is all to be expected.  I remember having this issue with a
server on 100M and clients in 10M...

There is no flow control in UDP.  If anything gets lots, the client
has to resend the request, and the server then has to respond again.
If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
then there is a good chance that one or more fragments of a reply will
get lots in the switch stepping down from 1G to 100M.  Every time.

Your options include:

  - use tcp
  - get a switch with a (much) bigger packet buffer
  - drop the server down to 100M
  - drop the nfs rsize down to 1024 to you don't get fragments.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-17  2:20     ` Neil Brown
@ 2006-03-17  3:11       ` Bret Towe
  2006-03-17  3:41         ` Neil Brown
  2006-03-17  9:13         ` Helge Hafting
  0 siblings, 2 replies; 10+ messages in thread
From: Bret Towe @ 2006-03-17  3:11 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jan Engelhardt, linux-kernel

On 3/16/06, Neil Brown <neilb@suse.de> wrote:
> On Thursday March 16, magnade@gmail.com wrote:
> > On 3/16/06, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote:
> > > >
> > > >a while ago i noticed a issue when one has a nfs server that has
> > > >gigabit connection
> > > >to a network and a client that connects to that network instead via 100baseT
> > > >that udp connection from client to server fails the client gets a
> > > >server not responding
> > > >message when trying to access a file, interesting bit is you can get a directory
> > > >listing without issue
> > > >work around i found for this is adding proto=tcp to the client side
> > > >and all works
> > > >without error
> > >
> > > UDP has its implications, like silently dropping packets when the link
> > > is full, by design. Try tcpdump on both systems and compare what packets
> > > are sent and which do arrive. The error message is then probably because
> > > the client is confused of not receiving some packets.
> >
> > after compairing a working and not working client i found that
> > packets containing offset 19240, 20720, 22200 are missing
> > and the 100baseT client had an extra offset of 32560
> > on the working client it ends at 31080
> >
> > the missing ones are mostly constantly missing 22200 appears every so often
> > on retransmission and 23680 also disappears every so often
> >
> > i hope that isnt too confusing i dont use tcpdump type stuff much
> > (well i did give up on tcpdump and had to use ethereal...)
>
> This is all to be expected.  I remember having this issue with a
> server on 100M and clients in 10M...
>
> There is no flow control in UDP

is this a linux design flaw or just nature of udp?

>.  If anything gets lots, the client
> has to resend the request, and the server then has to respond again.
> If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
> then there is a good chance that one or more fragments of a reply will
> get lots in the switch stepping down from 1G to 100M.  Every time.
>
> Your options include:
>
>   - use tcp

im wondering why this isnt the default to begin with

>   - get a switch with a (much) bigger packet buffer
>   - drop the server down to 100M
>   - drop the nfs rsize down to 1024 to you don't get fragments.
these last 2 options sound rather painfull speed wise
tcp work around is prob by far the easiest

>
> NeilBrown
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-17  3:11       ` Bret Towe
@ 2006-03-17  3:41         ` Neil Brown
  2006-03-17  4:13           ` Lee Revell
  2006-03-17  9:13         ` Helge Hafting
  1 sibling, 1 reply; 10+ messages in thread
From: Neil Brown @ 2006-03-17  3:41 UTC (permalink / raw)
  To: Bret Towe; +Cc: Jan Engelhardt, linux-kernel

On Thursday March 16, magnade@gmail.com wrote:
> On 3/16/06, Neil Brown <neilb@suse.de> wrote:
> >
> > There is no flow control in UDP
> 
> is this a linux design flaw or just nature of udp?

Just the nature of UDP.

> >
> >   - use tcp
> 
> im wondering why this isnt the default to begin with

Because it wasn't that many years ago that Linux NFS didn't support
tcp at all.
Some distributions modify 'mount' to get it to prefer tcp over udp.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-17  3:41         ` Neil Brown
@ 2006-03-17  4:13           ` Lee Revell
  0 siblings, 0 replies; 10+ messages in thread
From: Lee Revell @ 2006-03-17  4:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bret Towe, Jan Engelhardt, linux-kernel

On Fri, 2006-03-17 at 14:41 +1100, Neil Brown wrote:
> 
> > >
> > >   - use tcp
> > 
> > im wondering why this isnt the default to begin with
> 
> Because it wasn't that many years ago that Linux NFS didn't support
> tcp at all.
> Some distributions modify 'mount' to get it to prefer tcp over udp. 

Also historical reasons that predate Linux, the original NFS
implementations were UDP only.  TCP was not an option until NFSv3.

Lee


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-17  3:11       ` Bret Towe
  2006-03-17  3:41         ` Neil Brown
@ 2006-03-17  9:13         ` Helge Hafting
  1 sibling, 0 replies; 10+ messages in thread
From: Helge Hafting @ 2006-03-17  9:13 UTC (permalink / raw)
  To: Bret Towe; +Cc: Neil Brown, Jan Engelhardt, linux-kernel

Bret Towe wrote:

>On 3/16/06, Neil Brown <neilb@suse.de> wrote:
>  
>
>>There is no flow control in UDP
>>    
>>
>
>is this a linux design flaw or just nature of udp?
>  
>
That has nothing to do with linux at all.

"Now flow control in udp" is a udp design issue.  And it is not
a flaw either - the rule is simple:

If you need flow control - use tcp.
If you don't need flow control, and don't want the
overhead of flow control - use udp.

Udp is for those cases where flow control is consideres a waste of time.

Now, the original decision to base early NFS on udp, that was
a design mistake.  Again, not a linux problem but a nfs problem.
Fortunately, today a solution for this exists and is implemented
in linux - and it is nfs over tcp.

>>.  If anything gets lots, the client
>>has to resend the request, and the server then has to respond again.
>>If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
>>then there is a good chance that one or more fragments of a reply will
>>get lots in the switch stepping down from 1G to 100M.  Every time.
>>
>>Your options include:
>>
>>  - use tcp
>>    
>>
>
>im wondering why this isnt the default to begin with
>  
>
Hard to say.  I guess someone thought they could get better
performance with udp - it has less overhead.,
Then didn't bother testing this idea with a somewhat congested network?

Helge Hafting

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-15 22:24 nfs udp 1000/100baseT issue Bret Towe
  2006-03-16 20:41 ` Jan Engelhardt
@ 2006-03-17 11:18 ` Andrew Morton
  2006-03-17 15:53   ` Trond Myklebust
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-03-17 11:18 UTC (permalink / raw)
  To: Bret Towe; +Cc: linux-kernel

"Bret Towe" <magnade@gmail.com> wrote:
>
> ive seen this on kernels as far back as 2.6.13 on my own machines
>  (was around that time when i accutally got gigabit at home)
>  and recently noticed on some thin clients i maintain that 2.4 kernels
>  on the client side are also affected so perhaps its server side issue?
>  as all servers ive seen this on are 2.6 i havent used 2.4 kernels in ages
>  on my own machines so i havent looked into if 2.4 has that issue server side
>  or not

It would be interesting if you could do so.  I do recall that
nfs-over-crappy-udp was much better behaved in 2.4...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs udp 1000/100baseT issue
  2006-03-17 11:18 ` Andrew Morton
@ 2006-03-17 15:53   ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2006-03-17 15:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bret Towe, linux-kernel

On Fri, 2006-03-17 at 03:18 -0800, Andrew Morton wrote:
> "Bret Towe" <magnade@gmail.com> wrote:
> >
> > ive seen this on kernels as far back as 2.6.13 on my own machines
> >  (was around that time when i accutally got gigabit at home)
> >  and recently noticed on some thin clients i maintain that 2.4 kernels
> >  on the client side are also affected so perhaps its server side issue?
> >  as all servers ive seen this on are 2.6 i havent used 2.4 kernels in ages
> >  on my own machines so i havent looked into if 2.4 has that issue server side
> >  or not
> 
> It would be interesting if you could do so.  I do recall that
> nfs-over-crappy-udp was much better behaved in 2.4...

The 2.6 servers allow clients to use 32k block sizes for READ and WRITE
requests, and set that as the preferred size for both TCP and UDP. In
2.4, they only supported 8k blocks.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-03-17 15:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-15 22:24 nfs udp 1000/100baseT issue Bret Towe
2006-03-16 20:41 ` Jan Engelhardt
2006-03-17  1:33   ` Bret Towe
2006-03-17  2:20     ` Neil Brown
2006-03-17  3:11       ` Bret Towe
2006-03-17  3:41         ` Neil Brown
2006-03-17  4:13           ` Lee Revell
2006-03-17  9:13         ` Helge Hafting
2006-03-17 11:18 ` Andrew Morton
2006-03-17 15:53   ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox