* poor nfs performance & hangs with latest kernels @ 2007-02-19 14:49 Rich 2007-02-20 9:45 ` Neil Brown 0 siblings, 1 reply; 8+ messages in thread From: Rich @ 2007-02-19 14:49 UTC (permalink / raw) To: nfs hi. i am having a pretty weird nfs performance problems. (please, cc me, as i am not on the list). when there is some intensive nfs activity (write), all other nfs operations slow down to crawl or even stop at all during that time. i have been able to reproduce the problem with kernel versions 2.6.16.40, 2.6.19.2 and 2.6.20 (on slackware-11.0). another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6). when the problem appears, access to the same data both locally and even over ssh is happening without any slowdown, but nfs access is sometimes slowed down significantly, in some cases even being unable to list a directory for 30 minutes. in some cases, not only nfs slowdown happens, but whole system hangs. i have tried nfstools 1.0.10 and 1.0.7 & portmap 5.0 (though their versions seem to have no impact). there is one scenario where it is very easy to reproduce the problem (note : don't try this on a remote system or one you can not afford to hard reboot) : export a local directory. i'm using localhost(rw,no_root_squash,sync,no_subtree_check). mount it locally and try to perform a write operation : dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048 depending on machine, i jave managed to get from 32mb (my workstation) to ~ 880mb (server) files. then, either nfs becomes unaccessible (for all hosts and all disks/exported filesystems), or machine hangs completely. if machine does not hang, it's load increases steadily, and all cpus are shown to have very high iowait load. using 2.6.16.21, i was unable to hang my workstation, but server, even though it survived the test, is still having excessive load (~ 4). top lists as most resource hungry processes nfsd, kjournald and kblockd. in this case, stopping nfs server seems to lower the load and return the system to normal state. this seems to be very similar to already reported bug, http://bugme.osdl.org/show_bug.cgi?id=7943 has anybody else seen such a behaviour ? i would appreciate if somebody could test this or provide additional things to test. flips on #linux-nfs noted that "it would be good to get a sysrq-t listing". if required, i could try this on my workstation with either 16.21, 19.1 or .20 - though, if really needed, probably with any kernel version :) in such a case i would be glad to receive some guidance how to best perform this activity and gather output. thanks. -- Rich ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-19 14:49 poor nfs performance & hangs with latest kernels Rich @ 2007-02-20 9:45 ` Neil Brown 2007-02-20 12:45 ` Rich 2007-02-21 13:40 ` Jean-Noel Bouvier 0 siblings, 2 replies; 8+ messages in thread From: Neil Brown @ 2007-02-20 9:45 UTC (permalink / raw) To: Rich; +Cc: nfs On Monday February 19, rich@hq.vsaa.lv wrote: > hi. i am having a pretty weird nfs performance problems. > (please, cc me, as i am not on the list). > > when there is some intensive nfs activity (write), all other nfs > operations slow down to crawl or even stop at all during that time. > > i have been able to reproduce the problem with kernel versions > 2.6.16.40, 2.6.19.2 and 2.6.20 (on slackware-11.0). > another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6). Are there any kernels where you cannot reproduce the problem? > > when the problem appears, access to the same data both locally and even > over ssh is happening without any slowdown, but nfs access is sometimes > slowed down significantly, in some cases even being unable to list a > directory for 30 minutes. > in some cases, not only nfs slowdown happens, but whole system hangs. So we need to find out exactly what is happening when things slow down. Some things that might be useful: a tcpdump trace (use -s 0) of traffic which things are going slowly. "cat /proc/meminfo /proc/slabinfo". Get a copy when everything is fine, then another few then things are going slowly. Maybe "echo t > /proc/sysrq-trigger" and collect the kernel logs. If some processes are in 'D' status, this could give useful information. Get the various information on both the server and client if possible. Hopefully somewhere in all of that will be a clue. > there is one scenario where it is very easy to reproduce the problem > (note : don't try this on a remote system or one you can not afford to > hard reboot) : > > export a local directory. i'm using > localhost(rw,no_root_squash,sync,no_subtree_check). > mount it locally and try to perform a write operation : > dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048 This scenario is known to cause problems, is very hard to fix, and is a case of "well don't do that then". The problems here are probably unrelated to the problems you are having between separate machines. > > using 2.6.16.21, i was unable to hang my workstation, but server, even > though it survived the test, is still having excessive load (~ 4). top > lists as most resource hungry processes nfsd, kjournald and > kblockd. So 2.6.16.21 survives but 2.6.16.40 doesn't? Is that a reliable result? Is that with separate server and client, or server and client on the same machine? NeilBrown ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-20 9:45 ` Neil Brown @ 2007-02-20 12:45 ` Rich 2007-02-21 13:40 ` Jean-Noel Bouvier 1 sibling, 0 replies; 8+ messages in thread From: Rich @ 2007-02-20 12:45 UTC (permalink / raw) To: Neil Brown; +Cc: nfs Neil Brown wrote: > On Monday February 19, rich@hq.vsaa.lv wrote: ... >> when there is some intensive nfs activity (write), all other nfs >> operations slow down to crawl or even stop at all during that time. >> >> i have been able to reproduce the problem with kernel versions >> 2.6.16.40, 2.6.19.2 and 2.6.20 (on slackware-11.0). >> another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6). > > Are there any kernels where you cannot reproduce the problem? well, now that my localhost tests are invalidated, i will have to do additional testing :) ... >> export a local directory. i'm using >> localhost(rw,no_root_squash,sync,no_subtree_check). >> mount it locally and try to perform a write operation : >> dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048 > > This scenario is known to cause problems, is very hard to fix, and is > a case of "well don't do that then". The problems here are probably > unrelated to the problems you are having between separate machines. there i was, hoping to have found a reliable method to reproduce the problem... btw, what's the main cause for this problem ? could it also be observed on fast networks or is it limited to cases when server & client are on the same machine ? >> using 2.6.16.21, i was unable to hang my workstation, but server, even >> though it survived the test, is still having excessive load (~ 4). top >> lists as most resource hungry processes nfsd, kjournald and >> kblockd. > > So 2.6.16.21 survives but 2.6.16.40 doesn't? Is that a reliable > result? Is that with separate server and client, or server and client > on the same machine? most tests were done on a single machine (3 different ones, though), after i observed initial problems between several clients and single server. so now i will have to redo all the tests with separate client & server machines :) > NeilBrown -- Rich ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-20 9:45 ` Neil Brown 2007-02-20 12:45 ` Rich @ 2007-02-21 13:40 ` Jean-Noel Bouvier 2007-02-22 2:28 ` Neil Brown 1 sibling, 1 reply; 8+ messages in thread From: Jean-Noel Bouvier @ 2007-02-21 13:40 UTC (permalink / raw) To: Neil Brown; +Cc: rich, nfs Hello, I encounter the same NFS bad performance for kernels newer than 2.6.16.31 Tests : tar -xvf linux-2.4.32.tar Environment : - client 2.6.16.31 - client mounts a XFS file system through NFS on a remote machine with options = rw,tcp,intr - server : exporting file system with options = rw,sync,insecure Results : (according to server kernel version) 2.6.15.7 => 1 minute 10 sec 2.6.16.31 => 1 minute 02 sec 2.6.17.14 => 15 minutes 2.6.18.3 => 15 minutes Neil Brown wrote: > On Monday February 19, rich@hq.vsaa.lv wrote: > >>hi. i am having a pretty weird nfs performance problems. >>(please, cc me, as i am not on the list). >> >>when there is some intensive nfs activity (write), all other nfs >>operations slow down to crawl or even stop at all during that time. >> >>i have been able to reproduce the problem with kernel versions >>2.6.16.40, 2.6.19.2 and 2.6.20 (on slackware-11.0). >>another person reproduced the hang with 2.6.19-1.2911.fc6 (fedora core 6). > > > Are there any kernels where you cannot reproduce the problem? 2.6.16.31 and 2.6.15.7 are OK. >>when the problem appears, access to the same data both locally and even >>over ssh is happening without any slowdown, but nfs access is sometimes >>slowed down significantly, in some cases even being unable to list a >>directory for 30 minutes. >>in some cases, not only nfs slowdown happens, but whole system hangs. > > > So we need to find out exactly what is happening when things slow > down. > Some things that might be useful: > a tcpdump trace (use -s 0) of traffic which things are going slowly. > "cat /proc/meminfo /proc/slabinfo". Get a copy when everything is > fine, then another few then things are going slowly. > Maybe "echo t > /proc/sysrq-trigger" and collect the kernel logs. > If some processes are in 'D' status, this could give useful > information. Results : - 2.6.16.31 : load average 1; 5 threads and nfsd process are never with 'D' status. - 2.6.18.3 : load average 5; 2 threads and nfsd process are often with 'D' status. > Get the various information on both the server and client if > possible. Hopefully somewhere in all of that will be a clue. > > >>there is one scenario where it is very easy to reproduce the problem >>(note : don't try this on a remote system or one you can not afford to >>hard reboot) : >> >>export a local directory. i'm using >>localhost(rw,no_root_squash,sync,no_subtree_check). >>mount it locally and try to perform a write operation : >>dd if=/dev/zero of=/mounted_nfs/testfile bs=512k count=2048 > > > This scenario is known to cause problems, is very hard to fix, and is > a case of "well don't do that then". The problems here are probably > unrelated to the problems you are having between separate machines. > > >>using 2.6.16.21, i was unable to hang my workstation, but server, even >>though it survived the test, is still having excessive load (~ 4). top >>lists as most resource hungry processes nfsd, kjournald and >>kblockd. > > > So 2.6.16.21 survives but 2.6.16.40 doesn't? Is that a reliable > result? Is that with separate server and client, or server and client > on the same machine? > > NeilBrown > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-21 13:40 ` Jean-Noel Bouvier @ 2007-02-22 2:28 ` Neil Brown 2007-02-22 19:42 ` Norman Weathers 2007-03-05 13:30 ` Jean-Noel BOUVIER 0 siblings, 2 replies; 8+ messages in thread From: Neil Brown @ 2007-02-22 2:28 UTC (permalink / raw) To: Jean-Noel Bouvier; +Cc: rich, nfs On Wednesday February 21, jean-noel.bouvier@imag.fr wrote: > Hello, > > I encounter the same NFS bad performance for kernels newer than 2.6.16.31 > > Tests : tar -xvf linux-2.4.32.tar > > Environment : > - client 2.6.16.31 > - client mounts a XFS file system through NFS on a remote machine with > options = rw,tcp,intr > - server : exporting file system with options = rw,sync,insecure > > Results : (according to server kernel version) > 2.6.15.7 => 1 minute 10 sec > 2.6.16.31 => 1 minute 02 sec > 2.6.17.14 => 15 minutes > 2.6.18.3 => 15 minutes Those look like very strong results.... Could you try this patch on one of the later kernels and see if it helps? Otherwise we might have do to the 'git bisect' thing to find the offending patch. Thanks, NeilBrown Status: ok Stop NFSD writes from being broken into lots of little writes to filesystem. When NFSD receives a write request, the data is typically in a number of 1448 byte segments and writev is used to collect them together. Unfortunately, generic_file_buffered_write passes these to the filesystem one at a time, so an e.g. 32K over-write becomes a series of partial-page writes to each page, causing the filesystem to have to pre-read those pages - wasted effort. generic_file_buffered_write handles one segment of the vector at a time as it has to pre-fault in each segment to avoid deadlocks. When writing from kernel-space (and nfsd does) this is not an issue, so generic_file_buffered_write does not need to break and iovec from nfsd into little pieces. This patch avoids the splitting when get_fs is KERNEL_DS as it is from NFSd. This issue was introduced by commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Norman Weathers <norman.r.weathers@conocophillips.com> Cc: Vladimir V. Saveliev <vs@namesys.com> Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./.patches/current/mm/filemap.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff .prev/mm/filemap.c ./mm/filemap.c --- ./mm/filemap.c 2007-02-16 13:49:40.000000000 +1100 +++ ./.patches/current/mm/filemap.c 2007-02-16 13:55:39.000000000 +1100 @@ -2137,21 +2137,27 @@ generic_file_buffered_write(struct kiocb /* Limit the size of the copy to the caller's write size */ bytes = min(bytes, count); - /* - * Limit the size of the copy to that of the current segment, - * because fault_in_pages_readable() doesn't know how to walk - * segments. + /* We only need to worry about prefaulting when writes are from + * user-space. NFSd uses vfs_writev with several non-aligned + * segments in the vector, and limiting to one segment a time is + * a noticeable performance for re-write */ - bytes = min(bytes, cur_iov->iov_len - iov_base); - - /* - * Bring in the user page that we will copy from _first_. - * Otherwise there's a nasty deadlock on copying from the - * same page as we're writing to, without it being marked - * up-to-date. - */ - fault_in_pages_readable(buf, bytes); + if (!segment_eq(get_fs(), KERNEL_DS)) { + /* + * Limit the size of the copy to that of the current + * segment, because fault_in_pages_readable() doesn't + * know how to walk segments. + */ + bytes = min(bytes, cur_iov->iov_len - iov_base); + /* + * Bring in the user page that we will copy from + * _first_. Otherwise there's a nasty deadlock on + * copying from the same page as we're writing to, + * without it being marked up-to-date. + */ + fault_in_pages_readable(buf, bytes); + } page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec); if (!page) { status = -ENOMEM; ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-22 2:28 ` Neil Brown @ 2007-02-22 19:42 ` Norman Weathers 2007-02-22 20:34 ` Trond Myklebust 2007-03-05 13:30 ` Jean-Noel BOUVIER 1 sibling, 1 reply; 8+ messages in thread From: Norman Weathers @ 2007-02-22 19:42 UTC (permalink / raw) To: nfs Neil, I don't remember if I responded when you guys sent out the patch originally. It indeed helps the rewrite situation quite a bit with the 2.6.20 kernel. Hopefully, it will help Jean-Noel's problem as well. Thanks for getting that pushed through. There is another issue, and I am not for sure where it is related. About 18 months ago, we did some performance testing with our cluster nodes, and noticed very good performance using iozone (good performance meaning we had ~ 170 MB/s writes, even better for the rewrite mode, and off the chart with read, due to caching of course). Now, after a couple of updates, we have noticed that our writes and reads are about the same, but our rewrites using iozone are sometimes 1/2 of the performance of the writes. Also, during that time, the load average (10 nodes, gigabit connected, server bonded dual gigabit to our core switch) is anywhere from 15 to 32, and as it gets higher (closer to the 32), it can cause the server to be really slow. During this time, the disk io wait is high, but processor load is low. We have <>updated<> our clients from an older 2.6.12.1 to a 2.6.14-ck1 patch, and the servers from a 2.6.11.7 to 2.6.14-ck1 as well. Has there been some big changes in the way clients or servers are requesting write information during an iozone rewrite possibly. If I truncate the file and rewrite from scratch, I can almost always get the > 170 MB/s on the wire to my boxes... So, the patch does now allow the iozones to complete, but there is still a high load on the boxes. Any testing or other information I can get that will help, please let me know. Pertinent Information: 64 bit servers and clients FC3 and FC4 on the clients using 2.6.14 and later kernels FC3 and FC6 on the servers using 2.6.14 and later kernels All hosts are gigabit ethernet with the servers connected directly to our core switches, and clients connected to gigabit edge switches that are quad trunked back to the core (~ 64 nodes on the edge switch, but all nodes are idle during the test). Norman Weathers On Thu, 2007-02-22 at 13:28 +1100, Neil Brown wrote: > On Wednesday February 21, jean-noel.bouvier@imag.fr wrote: > > Hello, > > > > I encounter the same NFS bad performance for kernels newer than 2.6.16.31 > > > > Tests : tar -xvf linux-2.4.32.tar > > > > Environment : > > - client 2.6.16.31 > > - client mounts a XFS file system through NFS on a remote machine with > > options = rw,tcp,intr > > - server : exporting file system with options = rw,sync,insecure > > > > Results : (according to server kernel version) > > 2.6.15.7 => 1 minute 10 sec > > 2.6.16.31 => 1 minute 02 sec > > 2.6.17.14 => 15 minutes > > 2.6.18.3 => 15 minutes > > Those look like very strong results.... > > Could you try this patch on one of the later kernels and see if it > helps? > Otherwise we might have do to the 'git bisect' thing to find the > offending patch. > > Thanks, > NeilBrown > > > Status: ok > > Stop NFSD writes from being broken into lots of little writes to filesystem. > > When NFSD receives a write request, the data is typically in a number > of 1448 byte segments and writev is used to collect them together. > > Unfortunately, generic_file_buffered_write passes these to the filesystem > one at a time, so an e.g. 32K over-write becomes a series of partial-page > writes to each page, causing the filesystem to have to pre-read those > pages - wasted effort. > > generic_file_buffered_write handles one segment of the vector at a > time as it has to pre-fault in each segment to avoid deadlocks. When > writing from kernel-space (and nfsd does) this is not an issue, so > generic_file_buffered_write does not need to break and iovec from nfsd > into little pieces. > > This patch avoids the splitting when get_fs is KERNEL_DS as it is > from NFSd. > > This issue was introduced by commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 > > Cc: Nick Piggin <nickpiggin@yahoo.com.au> > Cc: Norman Weathers <norman.r.weathers@conocophillips.com> > Cc: Vladimir V. Saveliev <vs@namesys.com> > > Signed-off-by: Neil Brown <neilb@suse.de> > > ### Diffstat output > ./.patches/current/mm/filemap.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff .prev/mm/filemap.c ./mm/filemap.c > --- ./mm/filemap.c 2007-02-16 13:49:40.000000000 +1100 > +++ ./.patches/current/mm/filemap.c 2007-02-16 13:55:39.000000000 +1100 > @@ -2137,21 +2137,27 @@ generic_file_buffered_write(struct kiocb > /* Limit the size of the copy to the caller's write size */ > bytes = min(bytes, count); > > - /* > - * Limit the size of the copy to that of the current segment, > - * because fault_in_pages_readable() doesn't know how to walk > - * segments. > + /* We only need to worry about prefaulting when writes are from > + * user-space. NFSd uses vfs_writev with several non-aligned > + * segments in the vector, and limiting to one segment a time is > + * a noticeable performance for re-write > */ > - bytes = min(bytes, cur_iov->iov_len - iov_base); > - > - /* > - * Bring in the user page that we will copy from _first_. > - * Otherwise there's a nasty deadlock on copying from the > - * same page as we're writing to, without it being marked > - * up-to-date. > - */ > - fault_in_pages_readable(buf, bytes); > + if (!segment_eq(get_fs(), KERNEL_DS)) { > + /* > + * Limit the size of the copy to that of the current > + * segment, because fault_in_pages_readable() doesn't > + * know how to walk segments. > + */ > + bytes = min(bytes, cur_iov->iov_len - iov_base); > > + /* > + * Bring in the user page that we will copy from > + * _first_. Otherwise there's a nasty deadlock on > + * copying from the same page as we're writing to, > + * without it being marked up-to-date. > + */ > + fault_in_pages_readable(buf, bytes); > + } > page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec); > if (!page) { > status = -ENOMEM; > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-22 19:42 ` Norman Weathers @ 2007-02-22 20:34 ` Trond Myklebust 0 siblings, 0 replies; 8+ messages in thread From: Trond Myklebust @ 2007-02-22 20:34 UTC (permalink / raw) To: Norman Weathers; +Cc: nfs On Thu, 2007-02-22 at 13:42 -0600, Norman Weathers wrote: > Neil, > > I don't remember if I responded when you guys sent out the patch > originally. It indeed helps the rewrite situation quite a bit with the > 2.6.20 kernel. Hopefully, it will help Jean-Noel's problem as well. > Thanks for getting that pushed through. > > There is another issue, and I am not for sure where it is related. > About 18 months ago, we did some performance testing with our cluster > nodes, and noticed very good performance using iozone (good performance > meaning we had ~ 170 MB/s writes, even better for the rewrite mode, and > off the chart with read, due to caching of course). Now, after a couple > of updates, we have noticed that our writes and reads are about the > same, but our rewrites using iozone are sometimes 1/2 of the performance > of the writes. Also, during that time, the load average (10 nodes, > gigabit connected, server bonded dual gigabit to our core switch) is > anywhere from 15 to 32, and as it gets higher (closer to the 32), it can > cause the server to be really slow. During this time, the disk io wait > is high, but processor load is low. We have <>updated<> our clients > from an older 2.6.12.1 to a 2.6.14-ck1 patch, and the servers from a > 2.6.11.7 to 2.6.14-ck1 as well. Has there been some big changes in the > way clients or servers are requesting write information during an iozone > rewrite possibly. If I truncate the file and rewrite from scratch, I > can almost always get the > 170 MB/s on the wire to my boxes... > > So, the patch does now allow the iozones to complete, but there is still > a high load on the boxes. > > Any testing or other information I can get that will help, please let me > know. > > Pertinent Information: > 64 bit servers and clients > FC3 and FC4 on the clients using 2.6.14 and later kernels > FC3 and FC6 on the servers using 2.6.14 and later kernels > All hosts are gigabit ethernet with the servers connected directly to > our core switches, and clients connected to gigabit edge switches that > are quad trunked back to the core (~ 64 nodes on the edge switch, but > all nodes are idle during the test). > > Norman Weathers I'm confused. Are you seeing this with a stock 2.6.20 kernel, or only the older kernels? Cheers Trond ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: poor nfs performance & hangs with latest kernels 2007-02-22 2:28 ` Neil Brown 2007-02-22 19:42 ` Norman Weathers @ 2007-03-05 13:30 ` Jean-Noel BOUVIER 1 sibling, 0 replies; 8+ messages in thread From: Jean-Noel BOUVIER @ 2007-03-05 13:30 UTC (permalink / raw) To: Neil Brown; +Cc: rich, nfs Hello, On Thu, 22 Feb 2007, Neil Brown wrote: > On Wednesday February 21, jean-noel.bouvier@imag.fr wrote: >> Hello, >> >> I encounter the same NFS bad performance for kernels newer than 2.6.16.31 >> >> Tests : tar -xvf linux-2.4.32.tar >> >> Environment : >> - client 2.6.16.31 >> - client mounts a XFS file system through NFS on a remote machine with >> options =3D rw,tcp,intr >> - server : exporting file system with options =3D rw,sync,insecure >> >> Results : (according to server kernel version) >> 2.6.15.7 =3D> 1 minute 10 sec >> 2.6.16.31 =3D> 1 minute 02 sec >> 2.6.17.14 =3D> 15 minutes >> 2.6.18.3 =3D> 15 minutes > > Those look like very strong results.... > > Could you try this patch on one of the later kernels and see if it > helps? Unfortunately, this patch did not enable better performance : Linux 2.6.16.31/XFS+NFS =3D> 1min (4 process with status S+S+R+D) Linux 2.6.17.14/XFS+NFS =3D> 15min (2 process with status S+D) Linux 2.6.17.14/XFS+NFS + patch =3D> 14min51 (2 process with status S+D) But, I also tried the same tests on a XFS and EXT3 file systems but = ***without*** NFS, just to be sure : Linux 2.6.16.31/XFS =3D> 0 min 12s | Linux 2.6.16.31/EXT3 =3D> 0 min 07s Linux 2.6.17.14/XFS =3D> 0 min 51s | Linux 2.6.17.14/EXT3 =3D> 0 min 04s Linux 2.6.18-4/XFS =3D> 0 min 57s | Linux 2.6.18-4/EXT3 =3D> 0 min 02s So I am afraid the problem is rather bound to XFS and >2.6.16 kernels = than NFS. Do you agree or do you want me to do other tests to be sure NFS is not = concerned ? > Otherwise we might have do to the 'git bisect' thing to find the > offending patch. > > Thanks, > NeilBrown Thanks, -- = Jean-No=EBl BOUVIER ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-03-05 13:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-02-19 14:49 poor nfs performance & hangs with latest kernels Rich 2007-02-20 9:45 ` Neil Brown 2007-02-20 12:45 ` Rich 2007-02-21 13:40 ` Jean-Noel Bouvier 2007-02-22 2:28 ` Neil Brown 2007-02-22 19:42 ` Norman Weathers 2007-02-22 20:34 ` Trond Myklebust 2007-03-05 13:30 ` Jean-Noel BOUVIER
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.