RedHat 8.0 nfs

All of lore.kernel.org
 help / color / mirror / Atom feed

* RedHat 8.0 nfs
@ 2003-03-20 16:52 pwitting
  2003-03-20 20:28 ` Steve Dickson
  0 siblings, 1 reply; 9+ messages in thread
From: pwitting @ 2003-03-20 16:52 UTC (permalink / raw)
  To: nfs

I'm migrating some decent sized RH nfs servers to RedHat 8.0. Looking things
over, it seems they've change the nfs scripts fairly significantly, adding
the possibility of a /etc/sysconfig/nfs file (doesn't exist by default) for
storing config parameters. I went ahead and created the sysconfig file using
the defaults, then changed the number of threads from the default 8 

cat /etc/sysconfig/nfs 
# Referenced by Red Hat 8.0 nfs script to set initial values

# Number of threads start. 
RPCNFSDCOUNT=120

# yes, no, or auto (attempts to auto-detect support and enable)
MOUNTD_NFS_V2=auto
MOUNTD_NFS_V3=auto

# Should we tune TCP/IP settings for nfs (consumes RAM)
TUNE_QUEUE=yes
# 256kb recommended minimum size based on SPECsfs NFS benchmarks
# default values:
#       net.core.rmem_default    65535
#       net.core.rmem_max       131071
NFS_QS=262144

# Force rpc.mountd to bind to the specified port num, instead of using 
# the random port number assigned by the portmapper.
#MOUNTD_PORT=

# Any other options can be passed here. The primary option not covered 
# here would be -o num or  --descriptors num; which set the limit of 
# the number of open file descriptors to num. The default is 256.
#RPCMOUNTDOPTS="-o 256"

So what guidelines should I use in setting these values? RH 7.3 limited me
to 128 threads (I got nervous using ALL the possible threads and backed it
off), RH8 supports more. Judging by the /proc, there is a significant amount
of time where all threads are in use, even at 120.

Also, what would be a good number for NFS_QS? Both rmem.default and rmem.max
will be set to this number; should it be a multiple of threads? Say
something like smallest binary power (2^n) greater than 1500 (MTU size) *
$RPCNFSDCOUNT 

So 120*1500 = 180,000 > 262,144, but
   220*1500 = 330,000 > 524,288

Or should RPCNFSDCOUNT itself be some power of 2?

And while we're on the subject of RedHat, any idea how up to date the NFS
code is in their latest kernel (2.4.18-27)? Its nice relying on their bug
testing/security/useful_patches/fixes, but I saw a significant performance
increase by migrating to 2.4.20 a few months ago. I also ran into a wall
trying to integrate the Q-Logic FC drivers I need for some of my systems.

Thanks

-------------------------------------------------------
This SF.net email is sponsored by: Tablet PC.  
Does your code think in ink? You could win a Tablet PC. 
Get a free Tablet PC hat just for playing. What are you waiting for? 
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
  2003-03-20 16:52 RedHat 8.0 nfs pwitting
@ 2003-03-20 20:28 ` Steve Dickson
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Dickson @ 2003-03-20 20:28 UTC (permalink / raw)
  To: nfs



pwitting@Cyveillance.com wrote

>So what guidelines should I use in setting these values? RH 7.3 limited me
>to 128 threads (I got nervous using ALL the possible threads and backed it
>off), RH8 supports more. Judging by the /proc, there is a significant amount
>of time where all threads are in use, even at 120
>
Not knowing anything about the size of your machines, the type of
traffic, your network, or the history of what works and what doesn't
makes it  tough to give a decisive answer... So I can only suggest
crank it up until it hurts. :-) Theoretically RH8.0 can support up to 32000
processes and 1200 threaded processes, so its all dependent on the size
of your machine (i.e. number of cpus, amount of memory, type of storage)
Experimentation is your friend...

>
>Also, what would be a good number for NFS_QS? Both rmem.default and rmem.max
>will be set to this number; should it be a multiple of threads? Say
>something like smallest binary power (2^n) greater than 1500 (MTU size) *
>$RPCNFSDCOUNT 
>
>So 120*1500 = 180,000 > 262,144, but
>   220*1500 = 330,000 > 524,288
>  
>
Again this dependent on how much memory you have...

>Or should RPCNFSDCOUNT itself be some power of 2?
>
I don't think it matters or at least I don't see why it should...

>
>And while we're on the subject of RedHat, any idea how up to date the NFS
>code is in their latest kernel (2.4.18-27)? Its nice relying on their bug
>testing/security/useful_patches/fixes, but I saw a significant performance
>increase by migrating to 2.4.20 a few months ago. I also ran into a wall
>trying to integrate the Q-Logic FC drivers I need for some of my systems.
>  
>
Unfortunately, we not able to get in some of latest performance 
enhancements
due to our QA cycle...

SteveD.





-------------------------------------------------------
This SF.net email is sponsored by: Tablet PC.  
Does your code think in ink? You could win a Tablet PC. 
Get a free Tablet PC hat just for playing. What are you waiting for? 
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
@ 2003-03-24 18:42 pwitting
  2003-03-25 22:50 ` David Dougall
  0 siblings, 1 reply; 9+ messages in thread
From: pwitting @ 2003-03-24 18:42 UTC (permalink / raw)
  To: nfs

>>Also, what would be a good number for NFS_QS? Both rmem.default and
>>rmem.max will be set to this number; should it be a multiple of threads?
>>Say something like smallest binary power (2^n) greater than 1500 (MTU
size) >>* $RPCNFSDCOUNT

> Again this dependent on how much memory you have...

I was reviewing this topic in "Red Hat Linux Security & Optimization", it
claims that each nfsd daemon will receive a queue of size NFS_QS, so I
assume the above concerns aren't relevant. (aside from memory concerns,
using 256kb * 200 threads/daemon = 50Mb for input queues :^)

As a side note, with the new RH scripts upping the queue size, and revamped
client mount scripts upping the rsize/wsize, thread utilization seems to be
way down and performance up

Th 120 5652 4721.126 892.199 136.818 25.683 5.923 2.136 1.755 1.833 1.689
3.267

In the past my 100% numbers would have spiked sharply upwards (And this is
with a recompiled official RedHat kernel, not the known faster 2.4.20
kernel). 'Course, its also with a new core system, running Dual 1.3Ghz cpu
instead of Dual 667Mhz; but the IO is pretty much the same, a FC attached
IBM ESS Shark. But I expect that the CPU's aren't a huge factor, since the
old system never went above about 60% utilization in a pure nfs mode. 

So all in all, life is good. Thanks for the help.

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
  2003-03-24 18:42 pwitting
@ 2003-03-25 22:50 ` David Dougall
  2003-03-25 23:19   ` Trond Myklebust
  2003-03-25 23:53   ` Neil Brown
  0 siblings, 2 replies; 9+ messages in thread
From: David Dougall @ 2003-03-25 22:50 UTC (permalink / raw)
  To: pwitting@Cyveillance.com; +Cc: nfs@lists.sourceforge.net

This doesn't agree with what I have heard before.  I was always told that
this memory amount( 256k default) was shared by all threads.  Therefore
the more threads you had, the more contention for memory, less each thread
had, etc.  You are now saying that each thread gets its own amount.
Which one is correct.  This drastically affects the number of threads and
the value that I assign in this parameter.
Anyone who authoritatively knows, please advise.
--David Dougall


On Mon, 24 Mar 2003, pwitting@Cyveillance.com wrote:

>
> >>Also, what would be a good number for NFS_QS? Both rmem.default and
> >>rmem.max will be set to this number; should it be a multiple of threads?
> >>Say something like smallest binary power (2^n) greater than 1500 (MTU
> size) >>* $RPCNFSDCOUNT
>
> > Again this dependent on how much memory you have...
>
> I was reviewing this topic in "Red Hat Linux Security & Optimization", it
> claims that each nfsd daemon will receive a queue of size NFS_QS, so I
> assume the above concerns aren't relevant. (aside from memory concerns,
> using 256kb * 200 threads/daemon = 50Mb for input queues :^)
>
> As a side note, with the new RH scripts upping the queue size, and revamped
> client mount scripts upping the rsize/wsize, thread utilization seems to be
> way down and performance up
>
> Th 120 5652 4721.126 892.199 136.818 25.683 5.923 2.136 1.755 1.833 1.689
> 3.267
>
> In the past my 100% numbers would have spiked sharply upwards (And this is
> with a recompiled official RedHat kernel, not the known faster 2.4.20
> kernel). 'Course, its also with a new core system, running Dual 1.3Ghz cpu
> instead of Dual 667Mhz; but the IO is pretty much the same, a FC attached
> IBM ESS Shark. But I expect that the CPU's aren't a huge factor, since the
> old system never went above about 60% utilization in a pure nfs mode.
>
> So all in all, life is good. Thanks for the help.
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>
>

______________________________________
Inflex Virus Scanner - installed on mailserver for domain @et.byu.edu
Queries to: postmaster@et.byu.edu


-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
  2003-03-25 22:50 ` David Dougall
@ 2003-03-25 23:19   ` Trond Myklebust
  2003-03-25 23:53   ` Neil Brown
  1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2003-03-25 23:19 UTC (permalink / raw)
  To: David Dougall; +Cc: pwitting@Cyveillance.com, nfs@lists.sourceforge.net

>>>>> " " == David Dougall <davidd@et.byu.edu> writes:

     > This doesn't agree with what I have heard before.  I was always
     > told that this memory amount( 256k default) was shared by all
     > threads.  Therefore the more threads you had, the more

You were misinformed. In Linux 2.4.x, each RPC server thread allocates
a private buffer.

Cheers,
  Trond


-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
  2003-03-25 22:50 ` David Dougall
  2003-03-25 23:19   ` Trond Myklebust
@ 2003-03-25 23:53   ` Neil Brown
  1 sibling, 0 replies; 9+ messages in thread
From: Neil Brown @ 2003-03-25 23:53 UTC (permalink / raw)
  To: David Dougall; +Cc: pwitting@Cyveillance.com, nfs@lists.sourceforge.net

On Tuesday March 25, davidd@et.byu.edu wrote:
> This doesn't agree with what I have heard before.  I was always told that
> this memory amount( 256k default) was shared by all threads.  Therefore
> the more threads you had, the more contention for memory, less each thread
> had, etc.  You are now saying that each thread gets its own amount.
> Which one is correct.  This drastically affects the number of threads and
> the value that I assign in this parameter.
> Anyone who authoritatively knows, please advise.

Well.... I wrote some of the code, so hopefully I understand what is
going on, but don't bet on it :-)

Firstly, for kernels later than about 2.4.20-pre4, this setting is not
needed and is ignored.  The nfsd code does 'the right thing'.  
The setting is only needed for earlier kernels.

The amount of memory you assign when you write a number out to
 /proc/sys/net/core/rmem_default
is a per-socket (not per-thread) maximum.

The memory is *not* pre-allocated and is *not* guaranteed to be
available.

When a packet arrives on a socket, it sits on an 'input queue' until
it is taken off by some process.  The networking layer limits the
amount of memory that can be used by the input queue of any socket.
This number that is being changed is that limit.

The current nfsd code leaves UDP packets on the incoming queue while
processing them.  This means that we need this limit to be fairly
high: high enough that one request per thread does not block the queue
so atleast a few further requests can arrive.

With this number at the default (64k), I fairly often noticed incoming
requests being silently dropped because there wasn't enough buffer
space, even though there were plenty of idle threads.

I raised it to "average packet size * number of threads" and the
problem went away.  256K seems a reasonable sort of number.

This doesn't means that 256K *will* be using, but that (if it is
available) it *may* be used.  Certainly 256K * number-of-threads will
NOT be used.

Hope that helps.

NeilBrown


> --David Dougall
> 
> 
> On Mon, 24 Mar 2003, pwitting@Cyveillance.com wrote:
> 
> >
> > >>Also, what would be a good number for NFS_QS? Both rmem.default and
> > >>rmem.max will be set to this number; should it be a multiple of threads?
> > >>Say something like smallest binary power (2^n) greater than 1500 (MTU
> > size) >>* $RPCNFSDCOUNT
> >
> > > Again this dependent on how much memory you have...
> >
> > I was reviewing this topic in "Red Hat Linux Security & Optimization", it
> > claims that each nfsd daemon will receive a queue of size NFS_QS, so I
> > assume the above concerns aren't relevant. (aside from memory concerns,
> > using 256kb * 200 threads/daemon = 50Mb for input queues :^)
> >
> > As a side note, with the new RH scripts upping the queue size, and revamped
> > client mount scripts upping the rsize/wsize, thread utilization seems to be
> > way down and performance up
> >
> > Th 120 5652 4721.126 892.199 136.818 25.683 5.923 2.136 1.755 1.833 1.689
> > 3.267
> >
> > In the past my 100% numbers would have spiked sharply upwards (And this is
> > with a recompiled official RedHat kernel, not the known faster 2.4.20
> > kernel). 'Course, its also with a new core system, running Dual 1.3Ghz cpu
> > instead of Dual 667Mhz; but the IO is pretty much the same, a FC attached
> > IBM ESS Shark. But I expect that the CPU's aren't a huge factor, since the
> > old system never went above about 60% utilization in a pure nfs mode.
> >
> > So all in all, life is good. Thanks for the help.
> >
> >
> > -------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > NFS maillist  -  NFS@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs
> >
> >
> 
> ______________________________________
> Inflex Virus Scanner - installed on mailserver for domain @et.byu.edu
> Queries to: postmaster@et.byu.edu
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by:
> The Definitive IT and Networking Event. Be There!
> NetWorld+Interop Las Vegas 2003 -- Register today!
> http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: RedHat 8.0 nfs
@ 2003-03-25 23:27 pwitting
  0 siblings, 0 replies; 9+ messages in thread
From: pwitting @ 2003-03-25 23:27 UTC (permalink / raw)
  To: davidd; +Cc: nfs

I did some more checking, per "Linux NFS and Automounter Administration",
the queue is split between all the NFS server threads. So the default 64k 
is split between the 8 default threads, 8k per thread.

So some of this seems to be confusion between what is a thread and what is a
daemon; per Kabir's book, we are starting 8 daemons by default, and each
daemon gets its own queue. Per Craig Hunt, we are starting 8 threads by
default, and the queue is split between the threads of a single daemon.

As David points out, there's a huge difference in the two semantic points,
especially for those of us running hundreds of threads. 

Anyone know for sure? Any quick and dirty tests to prove this to ourselves?

> From: David Dougall [mailto:davidd@et.byu.edu]
> Subject: Re: [NFS] RedHat 8.0 nfs
> 
> This doesn't agree with what I have heard before.  I was always told that
> this memory amount( 256k default) was shared by all threads.  Therefore
> the more threads you had, the more contention for memory, less each thread
> had, etc.  You are now saying that each thread gets its own amount.
> Which one is correct.  This drastically affects the number of threads and
> the value that I assign in this parameter.
> Anyone who authoritatively knows, please advise.
> --David Dougall
> 
>On Mon, 24 Mar 2003, pwitting@Cyveillance.com wrote:
>>
>>>>Also, what would be a good number for NFS_QS? Both rmem.default and
>>>>rmem.max will be set to this number; should it be a multiple of threads?
>>>>Say something like smallest binary power (2^n) greater than 1500 (MTU
>>size) >>* $RPCNFSDCOUNT
>>
>>> Again this dependent on how much memory you have...
>>
>>I was reviewing this topic in "Red Hat Linux Security & Optimization", it
>> claims that each nfsd daemon will receive a queue of size NFS_QS, so I
>> assume the above concerns aren't relevant. (aside from memory concerns,
>> using 256kb * 200 threads/daemon = 50Mb for input queues :^)
>>

>>As a side note, with the new RH scripts upping the queue size,and revamped
>>client mount scripts upping the rsize/wsize, thread utilization seems tobe

>>way down and performance up
>>
>>Th 120 5652 4721.126 892.199 136.818 25.683 5.923 2.136 1.755 1.833 1.689
>>3.267
>>
>>In the past my 100% numbers would have spiked sharply upwards (And this is
>>with a recompiled official RedHat kernel, not the known faster 2.4.20
>>kernel). 'Course, its also with a new core system, running Dual 1.3Ghz cpu
>>instead of Dual 667Mhz; but the IO is pretty much the same, a FC attached
>>IBM ESS Shark. But I expect that the CPU's aren't a huge factor, since the
>>old system never went above about 60% utilization in a pure nfs mode.
>>
>>So all in all, life is good. Thanks for the help.



-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
@ 2003-03-26 18:34 pwitting
  2003-03-26 23:43 ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: pwitting @ 2003-03-26 18:34 UTC (permalink / raw)
  To: nfs; +Cc: neilb, davidd

From: Neil Brown <neilb@cse.unsw.edu.au>

> Well.... I wrote some of the code, so hopefully I understand what is
> going on, but don't bet on it :-)
> 
> Firstly, for kernels later than about 2.4.20-pre4, this setting is not
> needed and is ignored.  The nfsd code does 'the right thing'.
> The setting is only needed for earlier kernels.

While this sounds great, what exactly is the right thing? Is the code itself
adjusting the buffers to 256k? 512k? Or is it addressing the problem from a
different angle? (Could be why I saw a 10-15% speed improvement going from
RH's 2.4.18 kernel to a stock 2.4.20 kernel.)

Does this mean that altering the queue no longer has an effect?

> The amount of memory you assign when you write a number out to  
> /proc/sys/net/core/rmem_default is a per-socket (not per-thread) maximum.

So is a socket an ip:port thing, or is it an srcip:srcport>dstip:dstport
pair. 

> The memory is *not* pre-allocated and is *not* guaranteed to be
> available.

Hence no easy test for how much memory is consumed, though it makes me feel
better about cranking the queue size up another notch.

> The current nfsd code leaves UDP packets on the incoming queue while
> processing them.  This means that we need this limit to be fairly
> high: high enough that one request per thread does not block the queue
> so atleast a few further requests can arrive.
> 
> With this number at the default (64k), I fairly often noticed incoming
> requests being silently dropped because there wasn't enough buffer
> space, even though there were plenty of idle threads.
>
> I raised it to "average packet size * number of threads" and the
> problem went away.  256K seems a reasonable sort of number.

Which comes back to my original post, 

NFS_QS >= RPCNFSDCOUNT * MTU [ Currently 220 * 1500 = 330000, suggesting I
should up my NFSQS from 262144 to 524288 ]

Or should I be thinking at a higher level of the network stack, 

NFS_QS >= RPCNFSDCOUNT * MAX(rsize,wsize) [ Currently 220 * 8192 = 1802240,
suggesting I should up my NFSQS from 262144 to 2097152 ]

While it sounds like a huge number, were only talking about 2MB RAM on a
machine more or less dedicated to NFS.  

> This doesn't means that 256K *will* be using, but that (if it is
> available) it *may* be used.  Certainly 256K * number-of-threads will
> NOT be used.

Ah, which raises the question: When will it not be used? I assume we dump
file cache to open these queues, would we swap stuff out of RAM into VM? Or
do we only not get them when there's a real live RAM crunch?

Sorry to be a pain, but I'm very interested in getting the performance of my
NFS servers close to the performance of their disk subsystems so I can
convince the developers to stop running code on them. Last night my backup
copied a 59.53GB file in 7304 sec, by my math that's a rate of 8150.51KB/s
or 65.2kbps, fairly close to the peak I should see from Full Duplex
Ethernet. A quick test of my disk subsystem:

# SECONDS=0; dd if=/dev/zero of=./Zero.tmp bs=1024 count=25000000; echo
$SECONDS
25000000+0 records in
25000000+0 records out
2254

says I can write files at about 11,091 KB/s, so I'm not that far off.

-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RedHat 8.0 nfs
  2003-03-26 18:34 pwitting
@ 2003-03-26 23:43 ` Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2003-03-26 23:43 UTC (permalink / raw)
  To: pwitting; +Cc: nfs, davidd

On Wednesday March 26, pwitting@Cyveillance.com wrote:
> From: Neil Brown <neilb@cse.unsw.edu.au>
> 
> > Well.... I wrote some of the code, so hopefully I understand what is
> > going on, but don't bet on it :-)
> > 
> > Firstly, for kernels later than about 2.4.20-pre4, this setting is not
> > needed and is ignored.  The nfsd code does 'the right thing'.
> > The setting is only needed for earlier kernels.
> 
> While this sounds great, what exactly is the right thing? Is the code itself
> adjusting the buffers to 256k? 512k? Or is it addressing the problem from a
> different angle? (Could be why I saw a 10-15% speed improvement going from
> RH's 2.4.18 kernel to a stock 2.4.20 kernel.)

Check the calls to "svc_sock_setbufsize" in net/sunrpc/svcsock.c

For UDP, rsize and wsize are (nr_threads+3) * bufsize
For TCP, rsize and wsize are 3 * bufsize

Bufsize is 8K, so you can work it out yourself.

Remember, all this is doing is allowing the networking layer to
allocate enough memory for all threads to working on something
concurrently.

> 
> Does this mean that altering the queue no longer has an effect?

Yes.  The setting no longer has any effect on nfsd.

> 
> > The amount of memory you assign when you write a number out to  
> > /proc/sys/net/core/rmem_default is a per-socket (not per-thread) maximum.
> 
> So is a socket an ip:port thing, or is it an srcip:srcport>dstip:dstport
> pair. 

For UDP, which is the only protocol supported in kernels where this
tuning has any effect, it is *:port.  An NFSd only listens on one
socket (no matter how many IP addresses you machine has).

> >
> > I raised it to "average packet size * number of threads" and the
> > problem went away.  256K seems a reasonable sort of number.
> 
> Which comes back to my original post, 
> 
> NFS_QS >= RPCNFSDCOUNT * MTU [ Currently 220 * 1500 = 330000, suggesting I
> should up my NFSQS from 262144 to 524288 ]
> 
> Or should I be thinking at a higher level of the network stack, 
> 
> NFS_QS >= RPCNFSDCOUNT * MAX(rsize,wsize) [ Currently 220 * 8192 = 1802240,
> suggesting I should up my NFSQS from 262144 to 2097152 ]
> 

Yes. MTU is not relevant.  We are talking about NFS requests, not ethernet
packets.  The largest NFS request is a full sized WRITE, which would
be slightly more than 8K.
To be sure that all your 220 threads can be processing a full sized
write concurrently (which is unlikely, as there tend to be a lot of
getattrs spread around) you would need about 1.8Meg.

> While it sounds like a huge number, were only talking about 2MB RAM on a
> machine more or less dedicated to NFS.  

Exactly.  If you're machine has 32Meg of ram, probably you don't need
220 threads.  But if you have 1Gig of RAM, then you won't notice 220
threads (or even more).

> 
> > This doesn't means that 256K *will* be using, but that (if it is
> > available) it *may* be used.  Certainly 256K * number-of-threads will
> > NOT be used.
> 
> Ah, which raises the question: When will it not be used? I assume we dump
> file cache to open these queues, would we swap stuff out of RAM into VM? Or
> do we only not get them when there's a real live RAM crunch?

That's all up to the VM subsystem.
The VM tries to keep some amount of free memory.  When a network
packet comes in, some of the free memory will be allocated to store
the packet.  If there is no free memory, the packet will be dropped,
but if there is no  free memory then the VM system is already trying
to dump file cache or write out to SWAP to make some free memory.

Also, most of the time your server will have maybe a dozen threads
active and they will be handling small (read/lookup/getatrr etc)
requests that don't take up so much memory.  It is just when lots of
write come in that you use the memory.  And when that happens, you
really want the memory to be used to hold the requests rather than
having the requests be silently dropped.

NeilBrown

-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-03-26 23:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-20 16:52 RedHat 8.0 nfs pwitting
2003-03-20 20:28 ` Steve Dickson
  -- strict thread matches above, loose matches on Subject: below --
2003-03-24 18:42 pwitting
2003-03-25 22:50 ` David Dougall
2003-03-25 23:19   ` Trond Myklebust
2003-03-25 23:53   ` Neil Brown
2003-03-25 23:27 pwitting
2003-03-26 18:34 pwitting
2003-03-26 23:43 ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.