J. Bruce Fields wrote:
> On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
>> Recently I spent some time with others here at Red Hat looking
>> at problems with nfs server performance. One thing we found was that
>> there are some problems with multiple nfsd's. It seems like the I/O
>> scheduling or something is fooled by the fact that sequential write
>> calls are often handled by different nfsd's. This can negatively
>> impact performance (I don't think we've tracked this down completely
>> yet, however).
> 
> Yes, we've been trying to see how close to full network speed we can get
> over a 10 gig network and have run into situations where increasing the
> number of threads (without changing anything else) seems to decrease
> performance of a simple sequential write.
> 
> And the hypothesis that the problem was randomized IO scheduling was the
> first thing that came to mind.  But I'm not sure what the easiest way
> would be to really prove that that was the problem.

Here's an easy way for reads:  instrument the VFS code that manages 
read-ahead contexts.  Probably not an issue for krkumar2, since the file 
from one of the read tests is small enough to fit in the server's cache, 
and the other read test involves only /dev/null.

I had always thought wdelay would mitigate write request re-ordering, 
but I've never looked at how it's implemented in Linux's nfsd.  Of 
course, if the client is sending too many COMMIT requests, this will 
negate the benefit of wdelay.