poor nfs performance & hangs with latest kernels

All of lore.kernel.org
 help / color / mirror / Atom feed

* poor nfs performance & hangs with latest kernels
@ 2007-02-19 14:49 Rich
  2007-02-20  9:45 ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Rich @ 2007-02-19 14:49 UTC (permalink / raw)
  To: nfs

hi. i am having a pretty weird nfs performance problems.
(please, cc me, as i am not on the list).

when there is some intensive nfs activity (write), all other nfs 
operations slow down to crawl or even stop at all during that time.

i have been able to reproduce the problem with kernel versions 
2.6.16.40, 2.6.19.2 and 2.6.20 (on  slackware-11.0).
another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6).

when the problem appears, access to the same data both locally and even 
over ssh is happening  without any slowdown, but nfs access is sometimes 
slowed down significantly, in some cases  even being unable to list a 
directory for 30 minutes.
in some cases, not only nfs slowdown happens, but whole system hangs.

i have tried nfstools 1.0.10 and 1.0.7 & portmap 5.0 (though their 
versions seem to have no impact).

there is one scenario where it is very easy to reproduce the problem 
(note : don't try this on a  remote system or one you can not afford to 
hard reboot) :

export a local directory. i'm using 
localhost(rw,no_root_squash,sync,no_subtree_check).
mount it locally and try to perform a write operation :
dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048

depending on machine, i jave managed to get from 32mb (my workstation) 
to ~ 880mb (server) files. then, either nfs becomes unaccessible (for 
all hosts and all disks/exported filesystems), or machine hangs 
completely. if machine does not hang, it's load increases steadily, and 
all cpus are shown to have very high iowait load.

using 2.6.16.21, i was unable to hang my workstation, but server, even 
though it survived the test, is still having excessive load (~ 4). top 
lists as most resource hungry processes nfsd, kjournald and
kblockd.
in this case, stopping nfs server seems to lower the load and return the 
system to normal state.

this seems to be very similar to already reported bug, 
http://bugme.osdl.org/show_bug.cgi?id=7943

has anybody else seen such a behaviour ?
i would appreciate if somebody could test this or provide additional 
things to test. flips on #linux-nfs noted that "it would be good to get 
a sysrq-t listing". if required, i could try this on my workstation with 
either 16.21, 19.1 or .20 - though, if really needed, probably with any 
kernel version :)
in such a case i would be glad to receive some guidance how to best 
perform this activity and gather output.
thanks.
-- 
  Rich

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-19 14:49 poor nfs performance & hangs with latest kernels Rich
@ 2007-02-20  9:45 ` Neil Brown
  2007-02-20 12:45   ` Rich
  2007-02-21 13:40   ` Jean-Noel Bouvier
  0 siblings, 2 replies; 8+ messages in thread
From: Neil Brown @ 2007-02-20  9:45 UTC (permalink / raw)
  To: Rich; +Cc: nfs

On Monday February 19, rich@hq.vsaa.lv wrote:
> hi. i am having a pretty weird nfs performance problems.
> (please, cc me, as i am not on the list).
> 
> when there is some intensive nfs activity (write), all other nfs 
> operations slow down to crawl or even stop at all during that time.
> 
> i have been able to reproduce the problem with kernel versions 
> 2.6.16.40, 2.6.19.2 and 2.6.20 (on  slackware-11.0).
> another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6).

Are there any kernels where you cannot reproduce the problem?

> 
> when the problem appears, access to the same data both locally and even 
> over ssh is happening  without any slowdown, but nfs access is sometimes 
> slowed down significantly, in some cases  even being unable to list a 
> directory for 30 minutes.
> in some cases, not only nfs slowdown happens, but whole system hangs.

So we need to find out exactly what is happening when things slow
down.
Some things that might be useful:
  a tcpdump trace (use -s 0) of traffic which things are going slowly.
  "cat /proc/meminfo /proc/slabinfo".  Get a copy when everything is
     fine, then another few then things are going slowly.
  Maybe "echo t > /proc/sysrq-trigger" and collect the kernel logs.
    If some processes are in 'D' status, this could give useful
    information.

Get the various information on both the server and client if
possible.  Hopefully somewhere in all of that will be a clue.

> there is one scenario where it is very easy to reproduce the problem 
> (note : don't try this on a  remote system or one you can not afford to 
> hard reboot) :
> 
> export a local directory. i'm using 
> localhost(rw,no_root_squash,sync,no_subtree_check).
> mount it locally and try to perform a write operation :
> dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048

This scenario is known to cause problems, is very hard to fix, and is
a case of "well don't do that then".  The problems here are probably
unrelated to the problems you are having between separate machines.

> 
> using 2.6.16.21, i was unable to hang my workstation, but server, even 
> though it survived the test, is still having excessive load (~ 4). top 
> lists as most resource hungry processes nfsd, kjournald and
> kblockd.

So 2.6.16.21 survives but 2.6.16.40 doesn't?  Is that a reliable
result?  Is that with separate server and client, or server and client
on the same machine?

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-20  9:45 ` Neil Brown
@ 2007-02-20 12:45   ` Rich
  2007-02-21 13:40   ` Jean-Noel Bouvier
  1 sibling, 0 replies; 8+ messages in thread
From: Rich @ 2007-02-20 12:45 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

Neil Brown wrote:
> On Monday February 19, rich@hq.vsaa.lv wrote:
...
>> when there is some intensive nfs activity (write), all other nfs 
>> operations slow down to crawl or even stop at all during that time.
>>
>> i have been able to reproduce the problem with kernel versions 
>> 2.6.16.40, 2.6.19.2 and 2.6.20 (on  slackware-11.0).
>> another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6).
> 
> Are there any kernels where you cannot reproduce the problem?

well, now that my localhost tests are invalidated, i will have to do 
additional testing :)

...
>> export a local directory. i'm using 
>> localhost(rw,no_root_squash,sync,no_subtree_check).
>> mount it locally and try to perform a write operation :
>> dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048
> 
> This scenario is known to cause problems, is very hard to fix, and is
> a case of "well don't do that then".  The problems here are probably
> unrelated to the problems you are having between separate machines.

there i was, hoping to have found a reliable method to reproduce the 
problem...
btw, what's the main cause for this problem ? could it also be observed 
on fast networks or is it limited to cases when server & client are on 
the same machine ?

>> using 2.6.16.21, i was unable to hang my workstation, but server, even 
>> though it survived the test, is still having excessive load (~ 4). top 
>> lists as most resource hungry processes nfsd, kjournald and
>> kblockd.
> 
> So 2.6.16.21 survives but 2.6.16.40 doesn't?  Is that a reliable
> result?  Is that with separate server and client, or server and client
> on the same machine?

most tests were done on a single machine (3 different ones, though), 
after i observed initial problems between several clients and single 
server. so now i will have to redo all the tests with separate client & 
server machines :)

> NeilBrown
-- 
  Rich

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-20  9:45 ` Neil Brown
  2007-02-20 12:45   ` Rich
@ 2007-02-21 13:40   ` Jean-Noel Bouvier
  2007-02-22  2:28     ` Neil Brown
  1 sibling, 1 reply; 8+ messages in thread
From: Jean-Noel Bouvier @ 2007-02-21 13:40 UTC (permalink / raw)
  To: Neil Brown; +Cc: rich, nfs

Hello,

I encounter the same NFS bad performance for kernels newer than 2.6.16.31

Tests : tar -xvf linux-2.4.32.tar

Environment :
- client 2.6.16.31
- client mounts a XFS file system through NFS on a remote machine with
options = rw,tcp,intr
- server : exporting file system with options = rw,sync,insecure

Results : (according to server kernel version)
2.6.15.7 => 1 minute 10 sec
2.6.16.31 => 1 minute 02 sec
2.6.17.14 => 15 minutes
2.6.18.3 => 15 minutes

Neil Brown wrote:
> On Monday February 19, rich@hq.vsaa.lv wrote:
> 
>>hi. i am having a pretty weird nfs performance problems.
>>(please, cc me, as i am not on the list).
>>
>>when there is some intensive nfs activity (write), all other nfs 
>>operations slow down to crawl or even stop at all during that time.
>>
>>i have been able to reproduce the problem with kernel versions 
>>2.6.16.40, 2.6.19.2 and 2.6.20 (on  slackware-11.0).
>>another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6).
> 
> 
> Are there any kernels where you cannot reproduce the problem?

2.6.16.31 and 2.6.15.7 are OK.

>>when the problem appears, access to the same data both locally and even 
>>over ssh is happening  without any slowdown, but nfs access is sometimes 
>>slowed down significantly, in some cases  even being unable to list a 
>>directory for 30 minutes.
>>in some cases, not only nfs slowdown happens, but whole system hangs.
> 
> 
> So we need to find out exactly what is happening when things slow
> down.
> Some things that might be useful:
>   a tcpdump trace (use -s 0) of traffic which things are going slowly.
>   "cat /proc/meminfo /proc/slabinfo".  Get a copy when everything is
>      fine, then another few then things are going slowly.
>   Maybe "echo t > /proc/sysrq-trigger" and collect the kernel logs.
>     If some processes are in 'D' status, this could give useful
>     information.

Results :

- 2.6.16.31 : load average 1; 5 threads and nfsd process are never with
'D' status.
- 2.6.18.3 : load average 5; 2 threads and nfsd process are often with
'D' status.

> Get the various information on both the server and client if
> possible.  Hopefully somewhere in all of that will be a clue.
> 
> 
>>there is one scenario where it is very easy to reproduce the problem 
>>(note : don't try this on a  remote system or one you can not afford to 
>>hard reboot) :
>>
>>export a local directory. i'm using 
>>localhost(rw,no_root_squash,sync,no_subtree_check).
>>mount it locally and try to perform a write operation :
>>dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048
> 
> 
> This scenario is known to cause problems, is very hard to fix, and is
> a case of "well don't do that then".  The problems here are probably
> unrelated to the problems you are having between separate machines.
> 
> 
>>using 2.6.16.21, i was unable to hang my workstation, but server, even 
>>though it survived the test, is still having excessive load (~ 4). top 
>>lists as most resource hungry processes nfsd, kjournald and
>>kblockd.
> 
> 
> So 2.6.16.21 survives but 2.6.16.40 doesn't?  Is that a reliable
> result?  Is that with separate server and client, or server and client
> on the same machine?
> 
> NeilBrown
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs






-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-21 13:40   ` Jean-Noel Bouvier
@ 2007-02-22  2:28     ` Neil Brown
  2007-02-22 19:42       ` Norman Weathers
  2007-03-05 13:30       ` Jean-Noel BOUVIER
  0 siblings, 2 replies; 8+ messages in thread
From: Neil Brown @ 2007-02-22  2:28 UTC (permalink / raw)
  To: Jean-Noel Bouvier; +Cc: rich, nfs

On Wednesday February 21, jean-noel.bouvier@imag.fr wrote:
> Hello,
> 
> I encounter the same NFS bad performance for kernels newer than 2.6.16.31
> 
> Tests : tar -xvf linux-2.4.32.tar
> 
> Environment :
> - client 2.6.16.31
> - client mounts a XFS file system through NFS on a remote machine with
> options = rw,tcp,intr
> - server : exporting file system with options = rw,sync,insecure
> 
> Results : (according to server kernel version)
> 2.6.15.7 => 1 minute 10 sec
> 2.6.16.31 => 1 minute 02 sec
> 2.6.17.14 => 15 minutes
> 2.6.18.3 => 15 minutes

Those look like very strong results....

Could you try this patch on one of the later kernels and see if it
helps?
Otherwise we might have do to the 'git bisect' thing to find the
offending patch.

Thanks,
NeilBrown


Status: ok

Stop NFSD writes from being broken into lots of little writes to filesystem.

When NFSD receives a write request, the data is typically in a number
of 1448 byte segments and writev is used to collect them together.

Unfortunately, generic_file_buffered_write passes these to the filesystem
one at a time, so an e.g. 32K over-write becomes a series of partial-page
writes to each page, causing the filesystem to have to pre-read those
pages - wasted effort.

generic_file_buffered_write handles one segment of the vector at a
time as it has to pre-fault in each segment to avoid deadlocks.  When
writing from kernel-space (and nfsd does) this is not an issue, so
generic_file_buffered_write does not need to break and iovec from nfsd
into little pieces.

This patch avoids the splitting when  get_fs is KERNEL_DS as it is
from NFSd.

This issue was introduced by commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Norman Weathers <norman.r.weathers@conocophillips.com>
Cc: Vladimir V. Saveliev <vs@namesys.com>

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./.patches/current/mm/filemap.c |   32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff .prev/mm/filemap.c ./mm/filemap.c
--- ./mm/filemap.c	2007-02-16 13:49:40.000000000 +1100
+++ ./.patches/current/mm/filemap.c	2007-02-16 13:55:39.000000000 +1100
@@ -2137,21 +2137,27 @@ generic_file_buffered_write(struct kiocb
 		/* Limit the size of the copy to the caller's write size */
 		bytes = min(bytes, count);
 
-		/*
-		 * Limit the size of the copy to that of the current segment,
-		 * because fault_in_pages_readable() doesn't know how to walk
-		 * segments.
+		/* We only need to worry about prefaulting when writes are from
+		 * user-space.  NFSd uses vfs_writev with several non-aligned
+		 * segments in the vector, and limiting to one segment a time is
+		 * a noticeable performance for re-write
 		 */
-		bytes = min(bytes, cur_iov->iov_len - iov_base);
-
-		/*
-		 * Bring in the user page that we will copy from _first_.
-		 * Otherwise there's a nasty deadlock on copying from the
-		 * same page as we're writing to, without it being marked
-		 * up-to-date.
-		 */
-		fault_in_pages_readable(buf, bytes);
+		if (!segment_eq(get_fs(), KERNEL_DS)) {
+			/*
+			 * Limit the size of the copy to that of the current
+			 * segment, because fault_in_pages_readable() doesn't
+			 * know how to walk segments.
+			 */
+			bytes = min(bytes, cur_iov->iov_len - iov_base);
 
+			/*
+			 * Bring in the user page that we will copy from
+			 * _first_.  Otherwise there's a nasty deadlock on
+			 * copying from the same page as we're writing to,
+			 * without it being marked up-to-date.
+			 */
+			fault_in_pages_readable(buf, bytes);
+		}
 		page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
 		if (!page) {
 			status = -ENOMEM;

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-22  2:28     ` Neil Brown
@ 2007-02-22 19:42       ` Norman Weathers
  2007-02-22 20:34         ` Trond Myklebust
  2007-03-05 13:30       ` Jean-Noel BOUVIER
  1 sibling, 1 reply; 8+ messages in thread
From: Norman Weathers @ 2007-02-22 19:42 UTC (permalink / raw)
  To: nfs


Neil,

I don't remember if I responded when you guys sent out the patch
originally.  It indeed helps the rewrite situation quite a bit with the
2.6.20 kernel.  Hopefully, it will help Jean-Noel's problem as well.
Thanks for getting that pushed through.

There is another issue, and I am not for sure where it is related.
About 18 months ago, we did some performance testing with our cluster
nodes, and noticed very good performance using iozone (good performance
meaning we had ~ 170 MB/s writes, even better for the rewrite mode, and
off the chart with read, due to caching of course).  Now, after a couple
of updates, we have noticed that our writes and reads are about the
same, but our rewrites using iozone are sometimes 1/2 of the performance
of the writes.  Also, during that time, the load average (10 nodes,
gigabit connected, server bonded dual gigabit to our core switch) is
anywhere from 15 to 32, and as it gets higher (closer to the 32), it can
cause the server to be really slow.  During this time, the disk io wait
is high, but processor load is low.  We have <>updated<> our clients
from an older 2.6.12.1 to a 2.6.14-ck1 patch, and the servers from a
2.6.11.7 to 2.6.14-ck1 as well.  Has there been some big changes in the
way clients or servers are requesting write information during an iozone
rewrite possibly.  If I truncate the file and rewrite from scratch, I
can almost always get the > 170 MB/s on the wire to my boxes...

So, the patch does now allow the iozones to complete, but there is still
a high load on the boxes.

Any testing or other information I can get that will help, please let me
know.

Pertinent Information:
64 bit servers and clients
FC3 and FC4 on the clients using 2.6.14 and later kernels
FC3 and FC6 on the servers using 2.6.14 and later kernels
All hosts are gigabit ethernet with the servers connected directly to
our core switches, and clients connected to gigabit edge switches that
are quad trunked back to the core (~ 64 nodes on the edge switch, but
all nodes are idle during the test).

Norman Weathers



On Thu, 2007-02-22 at 13:28 +1100, Neil Brown wrote:
> On Wednesday February 21, jean-noel.bouvier@imag.fr wrote:
> > Hello,
> > 
> > I encounter the same NFS bad performance for kernels newer than 2.6.16.31
> > 
> > Tests : tar -xvf linux-2.4.32.tar
> > 
> > Environment :
> > - client 2.6.16.31
> > - client mounts a XFS file system through NFS on a remote machine with
> > options = rw,tcp,intr
> > - server : exporting file system with options = rw,sync,insecure
> > 
> > Results : (according to server kernel version)
> > 2.6.15.7 => 1 minute 10 sec
> > 2.6.16.31 => 1 minute 02 sec
> > 2.6.17.14 => 15 minutes
> > 2.6.18.3 => 15 minutes
> 
> Those look like very strong results....
> 
> Could you try this patch on one of the later kernels and see if it
> helps?
> Otherwise we might have do to the 'git bisect' thing to find the
> offending patch.
> 
> Thanks,
> NeilBrown
> 
> 
> Status: ok
> 
> Stop NFSD writes from being broken into lots of little writes to filesystem.
> 
> When NFSD receives a write request, the data is typically in a number
> of 1448 byte segments and writev is used to collect them together.
> 
> Unfortunately, generic_file_buffered_write passes these to the filesystem
> one at a time, so an e.g. 32K over-write becomes a series of partial-page
> writes to each page, causing the filesystem to have to pre-read those
> pages - wasted effort.
> 
> generic_file_buffered_write handles one segment of the vector at a
> time as it has to pre-fault in each segment to avoid deadlocks.  When
> writing from kernel-space (and nfsd does) this is not an issue, so
> generic_file_buffered_write does not need to break and iovec from nfsd
> into little pieces.
> 
> This patch avoids the splitting when  get_fs is KERNEL_DS as it is
> from NFSd.
> 
> This issue was introduced by commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
> 
> Cc: Nick Piggin <nickpiggin@yahoo.com.au>
> Cc: Norman Weathers <norman.r.weathers@conocophillips.com>
> Cc: Vladimir V. Saveliev <vs@namesys.com>
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> ### Diffstat output
>  ./.patches/current/mm/filemap.c |   32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff .prev/mm/filemap.c ./mm/filemap.c
> --- ./mm/filemap.c	2007-02-16 13:49:40.000000000 +1100
> +++ ./.patches/current/mm/filemap.c	2007-02-16 13:55:39.000000000 +1100
> @@ -2137,21 +2137,27 @@ generic_file_buffered_write(struct kiocb
>  		/* Limit the size of the copy to the caller's write size */
>  		bytes = min(bytes, count);
>  
> -		/*
> -		 * Limit the size of the copy to that of the current segment,
> -		 * because fault_in_pages_readable() doesn't know how to walk
> -		 * segments.
> +		/* We only need to worry about prefaulting when writes are from
> +		 * user-space.  NFSd uses vfs_writev with several non-aligned
> +		 * segments in the vector, and limiting to one segment a time is
> +		 * a noticeable performance for re-write
>  		 */
> -		bytes = min(bytes, cur_iov->iov_len - iov_base);
> -
> -		/*
> -		 * Bring in the user page that we will copy from _first_.
> -		 * Otherwise there's a nasty deadlock on copying from the
> -		 * same page as we're writing to, without it being marked
> -		 * up-to-date.
> -		 */
> -		fault_in_pages_readable(buf, bytes);
> +		if (!segment_eq(get_fs(), KERNEL_DS)) {
> +			/*
> +			 * Limit the size of the copy to that of the current
> +			 * segment, because fault_in_pages_readable() doesn't
> +			 * know how to walk segments.
> +			 */
> +			bytes = min(bytes, cur_iov->iov_len - iov_base);
>  
> +			/*
> +			 * Bring in the user page that we will copy from
> +			 * _first_.  Otherwise there's a nasty deadlock on
> +			 * copying from the same page as we're writing to,
> +			 * without it being marked up-to-date.
> +			 */
> +			fault_in_pages_readable(buf, bytes);
> +		}
>  		page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
>  		if (!page) {
>  			status = -ENOMEM;
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-22 19:42       ` Norman Weathers
@ 2007-02-22 20:34         ` Trond Myklebust
  0 siblings, 0 replies; 8+ messages in thread
From: Trond Myklebust @ 2007-02-22 20:34 UTC (permalink / raw)
  To: Norman Weathers; +Cc: nfs

On Thu, 2007-02-22 at 13:42 -0600, Norman Weathers wrote:
> Neil,
> 
> I don't remember if I responded when you guys sent out the patch
> originally.  It indeed helps the rewrite situation quite a bit with the
> 2.6.20 kernel.  Hopefully, it will help Jean-Noel's problem as well.
> Thanks for getting that pushed through.
> 
> There is another issue, and I am not for sure where it is related.
> About 18 months ago, we did some performance testing with our cluster
> nodes, and noticed very good performance using iozone (good performance
> meaning we had ~ 170 MB/s writes, even better for the rewrite mode, and
> off the chart with read, due to caching of course).  Now, after a couple
> of updates, we have noticed that our writes and reads are about the
> same, but our rewrites using iozone are sometimes 1/2 of the performance
> of the writes.  Also, during that time, the load average (10 nodes,
> gigabit connected, server bonded dual gigabit to our core switch) is
> anywhere from 15 to 32, and as it gets higher (closer to the 32), it can
> cause the server to be really slow.  During this time, the disk io wait
> is high, but processor load is low.  We have <>updated<> our clients
> from an older 2.6.12.1 to a 2.6.14-ck1 patch, and the servers from a
> 2.6.11.7 to 2.6.14-ck1 as well.  Has there been some big changes in the
> way clients or servers are requesting write information during an iozone
> rewrite possibly.  If I truncate the file and rewrite from scratch, I
> can almost always get the > 170 MB/s on the wire to my boxes...
> 
> So, the patch does now allow the iozones to complete, but there is still
> a high load on the boxes.
> 
> Any testing or other information I can get that will help, please let me
> know.
> 
> Pertinent Information:
> 64 bit servers and clients
> FC3 and FC4 on the clients using 2.6.14 and later kernels
> FC3 and FC6 on the servers using 2.6.14 and later kernels
> All hosts are gigabit ethernet with the servers connected directly to
> our core switches, and clients connected to gigabit edge switches that
> are quad trunked back to the core (~ 64 nodes on the edge switch, but
> all nodes are idle during the test).
> 
> Norman Weathers

I'm confused. Are you seeing this with a stock 2.6.20 kernel, or only
the older kernels?

Cheers
  Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: poor nfs performance & hangs with latest kernels
  2007-02-22  2:28     ` Neil Brown
  2007-02-22 19:42       ` Norman Weathers
@ 2007-03-05 13:30       ` Jean-Noel BOUVIER
  1 sibling, 0 replies; 8+ messages in thread
From: Jean-Noel BOUVIER @ 2007-03-05 13:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: rich, nfs

Hello,

On Thu, 22 Feb 2007, Neil Brown wrote:

> On Wednesday February 21, jean-noel.bouvier@imag.fr wrote:
>> Hello,
>>
>> I encounter the same NFS bad performance for kernels newer than 2.6.16.31
>>
>> Tests : tar -xvf linux-2.4.32.tar
>>
>> Environment :
>> - client 2.6.16.31
>> - client mounts a XFS file system through NFS on a remote machine with
>> options =3D rw,tcp,intr
>> - server : exporting file system with options =3D rw,sync,insecure
>>
>> Results : (according to server kernel version)
>> 2.6.15.7 =3D> 1 minute 10 sec
>> 2.6.16.31 =3D> 1 minute 02 sec
>> 2.6.17.14 =3D> 15 minutes
>> 2.6.18.3 =3D> 15 minutes
>
> Those look like very strong results....
>
> Could you try this patch on one of the later kernels and see if it
> helps?

Unfortunately, this patch did not enable better performance :

Linux 2.6.16.31/XFS+NFS =3D> 1min (4 process with status S+S+R+D)
Linux 2.6.17.14/XFS+NFS =3D> 15min (2 process with status S+D)
Linux 2.6.17.14/XFS+NFS + patch =3D> 14min51 (2 process with status S+D)

But, I also tried the same tests on a XFS and EXT3 file systems but =

***without*** NFS, just to be sure :

Linux 2.6.16.31/XFS =3D> 0 min 12s | Linux 2.6.16.31/EXT3 =3D> 0 min 07s
Linux 2.6.17.14/XFS =3D> 0 min 51s | Linux 2.6.17.14/EXT3 =3D> 0 min 04s
Linux 2.6.18-4/XFS =3D>  0 min 57s | Linux 2.6.18-4/EXT3 =3D>  0 min 02s

So I am afraid the problem is rather bound to XFS and >2.6.16 kernels =

than NFS.
Do you agree or do you want me to do other tests to be sure NFS is not =

concerned ?

> Otherwise we might have do to the 'git bisect' thing to find the
> offending patch.
>
> Thanks,
> NeilBrown

Thanks,
-- =

Jean-No=EBl BOUVIER

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE=
VDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-03-05 13:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-19 14:49 poor nfs performance & hangs with latest kernels Rich
2007-02-20  9:45 ` Neil Brown
2007-02-20 12:45   ` Rich
2007-02-21 13:40   ` Jean-Noel Bouvier
2007-02-22  2:28     ` Neil Brown
2007-02-22 19:42       ` Norman Weathers
2007-02-22 20:34         ` Trond Myklebust
2007-03-05 13:30       ` Jean-Noel BOUVIER

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.