All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS and RPC queues
@ 2009-03-11 20:00 Jim Callahan
  2009-03-11 22:07 ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Callahan @ 2009-03-11 20:00 UTC (permalink / raw)
  To: linux-nfs

I'm trying to diagnose an NFS performance problem and need to know a bit 
more about how the Linux NFS client works internally.  I've got a single 
NFS client which is connected to a network appliance file server via 1Gb 
ethernet.  We've removed all switches between the client and server to 
remove that variable from the equation.  We have two benchmarking 
scripts, one which performs generates mostly GETATTR requests and the 
other which generates mostly READ/WRITE requests from the server.  The 
file system is mounted using "actimeo=1".  This is required for our 
purposes since we need file status information which cannot be out of 
date by more than 1 second.

When we run the GETATTR test alone, we see around 6000 requests per 
second.  When we run one or more of the READ/WRITE tests at the same 
time from the same host, we see GETATTR performance drop considerably.  
Here's the average performance based on the number of READ/WRITE 
threads:  1=500/sec, 2=215/sec, 4=110/sec, 8=45/sec, 16=15/sec.   So 
there clearly seems to be some interaction between the response times to 
GETATTR based on the number of READ/WRITE operations going on.  Now, I'd 
assume that there would be some degradation but considering that GETATTR 
is a fast operation for the server to perform and the amount of data 
being send both ways is very small its more than I was expecting.

We then tried adding a second network card, connection and NFS mount 
from the same client to the same file server.  We segregated all GETATTR 
traffic to use one mount and the READ/WRITE to use the other mount.  
With this setup, GETATTR request seem to be almost completely unaffected 
by any amount of READ/WRITE activity.  Interestingly, the READ/WRITE 
performance was identical as in the one network/mount case.  So this 
seemed to validate that the GETATTR requests do not require much bandwidth.

We also performed this same set of tests on a different file server 
which has a completely different architecture and is also much older and 
it delivered nearly identical results.  This this what make us think it 
might be an NFS client related problem and not the server at all.

Sorry to be so long winded, but I needed a proper context to ask my 
questions...

Is the RPC queue used by NFS processed in a serial fashion for each 
mount to a unique IP address?  One possible explanation is that the 
GETATTR requests are simply waiting in line behind READ/WRITES in the 
one network case and that explains the drop in performance.  So does 
using a different IP address for the second mount create a second RPC 
queue which is processed in an asynchronous manner?  That might explain 
the lack of interaction between the GETATTR's and READ/WRITES in our 
second test...

If this is not the case, can you suggest any theories that might explain 
this results along with any tests we might perform to validate these 
theories?  Many thanks in advance for any insights you can provide!

-- 
Jim Callahan - President - Temerity Software <www.temerity.us>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS and RPC queues
  2009-03-11 20:00 NFS and RPC queues Jim Callahan
@ 2009-03-11 22:07 ` Trond Myklebust
  2009-03-12 20:09   ` Jim Callahan
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2009-03-11 22:07 UTC (permalink / raw)
  To: Jim Callahan; +Cc: linux-nfs

On Wed, 2009-03-11 at 16:00 -0400, Jim Callahan wrote:
> I'm trying to diagnose an NFS performance problem and need to know a bit 
> more about how the Linux NFS client works internally.  I've got a single 
> NFS client which is connected to a network appliance file server via 1Gb 
                                     NetApp ?
> ethernet.  We've removed all switches between the client and server to 
> remove that variable from the equation.  We have two benchmarking 
> scripts, one which performs generates mostly GETATTR requests and the 
> other which generates mostly READ/WRITE requests from the server.  The 
> file system is mounted using "actimeo=1".  This is required for our 
> purposes since we need file status information which cannot be out of 
> date by more than 1 second.
> 
> When we run the GETATTR test alone, we see around 6000 requests per 
> second.  When we run one or more of the READ/WRITE tests at the same 
> time from the same host, we see GETATTR performance drop considerably.  
> Here's the average performance based on the number of READ/WRITE 
> threads:  1=500/sec, 2=215/sec, 4=110/sec, 8=45/sec, 16=15/sec.   So 
> there clearly seems to be some interaction between the response times to 
> GETATTR based on the number of READ/WRITE operations going on.  Now, I'd 
> assume that there would be some degradation but considering that GETATTR 
> is a fast operation for the server to perform and the amount of data 
> being send both ways is very small its more than I was expecting.
> 
> We then tried adding a second network card, connection and NFS mount 
> from the same client to the same file server.  We segregated all GETATTR 
> traffic to use one mount and the READ/WRITE to use the other mount.  
> With this setup, GETATTR request seem to be almost completely unaffected 
> by any amount of READ/WRITE activity.  Interestingly, the READ/WRITE 
> performance was identical as in the one network/mount case.  So this 
> seemed to validate that the GETATTR requests do not require much bandwidth.
> 
> We also performed this same set of tests on a different file server 
> which has a completely different architecture and is also much older and 
> it delivered nearly identical results.  This this what make us think it 
> might be an NFS client related problem and not the server at all.
> 
> Sorry to be so long winded, but I needed a proper context to ask my 
> questions...
> 
> Is the RPC queue used by NFS processed in a serial fashion for each 
> mount to a unique IP address?  One possible explanation is that the 
> GETATTR requests are simply waiting in line behind READ/WRITES in the 
> one network case and that explains the drop in performance.  So does 
> using a different IP address for the second mount create a second RPC 
> queue which is processed in an asynchronous manner?  That might explain 
> the lack of interaction between the GETATTR's and READ/WRITES in our 
> second test...

Try increasing the value of the sunrpc.tcp_slot_table_entries sysctl
entry on the Linux client. That sysctl controls the maximum number of
simultaneous RPC messages that are allowed on the TCP connection.

The default is 16, but it can be increased to 128. Just make sure that
you umount all NFS partitions before changing it. (Doing 'mount
-oremount' won't work...)

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS and RPC queues
  2009-03-11 22:07 ` Trond Myklebust
@ 2009-03-12 20:09   ` Jim Callahan
  2009-03-12 21:30     ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Callahan @ 2009-03-12 20:09 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Trond Myklebust wrote:
> Jim Callahan wrote:
>   
>> I'm trying to diagnose an NFS performance problem and need to know a bit 
>> more about how the Linux NFS client works internally.  I've got a single 
>> NFS client which is connected to a network appliance file server via 1Gb 
>>     
>                                      NetApp ?
>   
Actually, not.   Its one of the other supposedly high performance 
commercial products though...  I'd rather not say which at this time in 
the interest of encouraging their help resolving this.

>> ...
>>
>> Is the RPC queue used by NFS processed in a serial fashion for each 
>> mount to a unique IP address?  One possible explanation is that the 
>> GETATTR requests are simply waiting in line behind READ/WRITES in the 
>> one network case and that explains the drop in performance.  So does 
>> using a different IP address for the second mount create a second RPC 
>> queue which is processed in an asynchronous manner?  That might explain 
>> the lack of interaction between the GETATTR's and READ/WRITES in our 
>> second test...
>>     
>
> Try increasing the value of the sunrpc.tcp_slot_table_entries sysctl
> entry on the Linux client. That sysctl controls the maximum number of
> simultaneous RPC messages that are allowed on the TCP connection.
>
> The default is 16, but it can be increased to 128. Just make sure that
> you umount all NFS partitions before changing it. (Doing 'mount
> -oremount' won't work...)
>
> Trond
>   
Thanks for clearing up what "sunrpc.tcp_slot_table_entries" meant.   Its 
odd however that even with a default size of 16 we should still see 
contention between only two OS level processes generating NFS requests 
though.  Shouldn't that only be using 2 of the 16 entries?  Even our 
worse case would only generate 17 processes running in parallel.  Or do 
I misunderstand the granularity?   If I have an NFS rsize/wsize of 32k 
and perform a C read()/write() of 64k, this will get chunked into two 
32k READ/WRITE NFS requests, correct?  But do both of these low level 
requests from a single process and read()/write() call get run in 
parallel if there are at least 2 slots in the RPC table available?  In 
other words, is there an interaction between the rsize/wsize and the 
number of slots in the RPC table that are used for a single C 
read()/write()?  Would increasing the rsize/wsize for files larger than 
this size lead to less utilization of the RPC table and therefore less 
contention?

Also, can someone please answer whether there is an RPC request table 
for each NFS mount to a different IP address or one shared table for all 
mounts?  I'd like to have a better understanding of why running our 
GETATTR requests down a different mount would eliminate the contention 
with the READ/WRITE requests going down the other mount.

We'll definitely be bumping up the "sunrpc.tcp_slot_table_entries" to 
128 and re-running our tests though.  Thanks for the help so far...

-- 
Jim Callahan - President - Temerity Software <www.temerity.us>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS and RPC queues
  2009-03-12 20:09   ` Jim Callahan
@ 2009-03-12 21:30     ` Trond Myklebust
  0 siblings, 0 replies; 4+ messages in thread
From: Trond Myklebust @ 2009-03-12 21:30 UTC (permalink / raw)
  To: Jim Callahan; +Cc: linux-nfs

On Thu, 2009-03-12 at 16:09 -0400, Jim Callahan wrote:
> Thanks for clearing up what "sunrpc.tcp_slot_table_entries" meant.   Its 
> odd however that even with a default size of 16 we should still see 
> contention between only two OS level processes generating NFS requests 
> though.  Shouldn't that only be using 2 of the 16 entries?  Even our 
> worse case would only generate 17 processes running in parallel.  Or do 
> I misunderstand the granularity?   If I have an NFS rsize/wsize of 32k 
> and perform a C read()/write() of 64k, this will get chunked into two 
> 32k READ/WRITE NFS requests, correct?  But do both of these low level 
> requests from a single process and read()/write() call get run in 
> parallel if there are at least 2 slots in the RPC table available?  In 
> other words, is there an interaction between the rsize/wsize and the 
> number of slots in the RPC table that are used for a single C 
> read()/write()?  Would increasing the rsize/wsize for files larger than 
> this size lead to less utilization of the RPC table and therefore less 
> contention?

The kernel readahead algorithm will cause a single process to create
more than 1 on-the-wire READ request at a time: the process won't wait
for the first READ request to complete before sending the next. The
different READ requests will execute in parallel, and will contend for
slots.

Write-behind does the same for WRITE requests.

You should note, however, that a single process is not allowed to use
more than 16 slots before it has to allow other processes access to the
slot table (however if no other processes have outstanding RPC requests,
then the it is allowed to hog the whole table).

> Also, can someone please answer whether there is an RPC request table 
> for each NFS mount to a different IP address or one shared table for all 
> mounts?  I'd like to have a better understanding of why running our 
> GETATTR requests down a different mount would eliminate the contention 
> with the READ/WRITE requests going down the other mount.

That depends on which kernel you are running. More recent kernels will
tend to multiplex traffic through a slot table that is shared between
all mounts to a single IP address.
Mounts to different IP addresses will, however never share a slot table.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-12 21:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-11 20:00 NFS and RPC queues Jim Callahan
2009-03-11 22:07 ` Trond Myklebust
2009-03-12 20:09   ` Jim Callahan
2009-03-12 21:30     ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.