From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Patrice Seyed" Subject: ReasmFails increases / NFS performance on Linux Cluster Date: Sat, 14 Aug 2004 03:46:44 -0400 Sender: nfs-admin@lists.sourceforge.net Message-ID: <001501c481d2$d8900950$6701a8c0@psyche1> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Cc: "'Patrice Seyed'" Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.12] helo=sc8-sf-mx2.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1BvtFp-0001aT-B7 for nfs@lists.sourceforge.net; Sat, 14 Aug 2004 00:47:05 -0700 Received: from acsn03.bu.edu ([128.197.159.63]) by sc8-sf-mx2.sourceforge.net with esmtp (Exim 4.34) id 1BvtFo-00025O-Rv for nfs@lists.sourceforge.net; Sat, 14 Aug 2004 00:47:05 -0700 To: Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: We have 1 ibm x345 as a management node, and 1 ibm x35 as a storage = node, (both dual cpu 2.8 xeon and 2.5gb memory) which is fiber channeled to a = Raid 5 array with ~900 GB usable. Then we have 10 IBM BladeCenters which = contain 14 blades per chassis except the last one (has 8). So 134 nodes with = dual xeon at 2.8Ghz, mostly with 1GB ram and 20 with 2GB. All nodes including x345s are set to 1000/full and cabled into a 3750 Cataylst. Bear in mind each BladeCenter chassis include a switch module, which is also set to 1000/full. In testing dd writes from /dev/zero to an nfs mount and back in a large number of batch jobs, I'm seeing high load on the storage node and heavy slowdowns for example for ssh login, df, or ls on the head node.=20 In my attempt to tune the storage node, I have tried 32k and now 8k (rsize,wsize in autofs) with no improvement in the slowness. Other parameters I am using are hard,intr,noatime,retrans=3D20,timeo=3D25. I = am currently running 64 nfsd daemons. Also I now have ipfrag_low_thresh and ipfrag_high_thread both set to 1045876. When I had doubled the default values for these settings a few weeks ago, it appeared to solve I/O = errors that were appearing in the logs for many of the nodes. Still now the ReasmFails value in /proc/net/snmp steadily increases (for 8k or 32k) = when I submit a moderate number of jobs (20-40) that are heavy on I/O. More info (on the storage node): $ netstat -s | less Ip: 226750504 total packets received 0 forwarded 501200 incoming packets discarded 164658619 incoming packets delivered 161895650 requests sent out 5006 fragments dropped after timeout 80794698 reassemblies required 13723023 packets reassembled ok 126665 packet reassembles failed 2858536 fragments received ok 14006786 fragments created Udp: 159845714 packets received 142 packets to unknown port received. 501200 packet receive errors 152909713 packets sent I welcome any suggestions or recommendations. Regards, =20 Patrice Seyed Linux System Administrator - SIG RHCE, SCSA Boston University School of Medicine ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs