From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Smith Subject: Poor performance (especially writes) Date: Fri, 08 Dec 2006 06:24:33 +1100 Message-ID: <45786A71.7080908@nighthawkrad.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1GsOtM-0001AU-0H for nfs@lists.sourceforge.net; Thu, 07 Dec 2006 11:26:50 -0800 Received: from mail.syd.nighthawkrad.net ([203.166.121.124]) by mail.sourceforge.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.44) id 1GsOtI-00028F-DC for nfs@lists.sourceforge.net; Thu, 07 Dec 2006 11:26:46 -0800 Received: from localhost (localhost [127.0.0.1]) by mail.syd.nighthawkrad.net (Postfix) with ESMTP id D705B108039 for ; Fri, 8 Dec 2006 06:24:19 +1100 (EST) Received: from mail.syd.nighthawkrad.net ([127.0.0.1]) by localhost (mail.syd.nighthawkrad.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id ntq0pfdisZNx for ; Fri, 8 Dec 2006 06:24:18 +1100 (EST) Received: from [192.168.0.128] (203-217-74-151.dyn.iinet.net.au [203.217.74.151]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.syd.nighthawkrad.net (Postfix) with ESMTP id 6E460108025 for ; Fri, 8 Dec 2006 06:24:18 +1100 (EST) To: nfs@lists.sourceforge.net List-Id: "Discussion of NFS under Linux development, interoperability, and testing." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfs-bounces@lists.sourceforge.net Errors-To: nfs-bounces@lists.sourceforge.net (Apologies if this is a reposts, but I haven't seen it appear four hours later and there's no indication the list is moderated...) Hello, all, I am having some trouble extracting good (or even average) performance out of my NFS server, particular for writes, and was hoping to get some suggestions as to how to improve it. The situation: We have an in house app that receives, processes and resends image files. Larger (512k - 1M) images come into two "indexing nodes", which extract some header data from them and write them to an NFS share ("store"). They then re-read these "large" images back from the share, compress and resize them, and write these smaller (150k - 400k) files to another NFS share ("images"). Finally, a "sender" program running from a third machine displays the smaller images in a viewer to end users (serving them up via apache reading from "images") and, on demand, sends the corresponding set of larger images (from "store") to a number of destination machines (using scp I believe). Apparently the read patterns on the smaller images from the viewer are quite pathological, with lots of random, very small (10k - 50k) read from the actual image file. Currently the NFS server is a Dell PE750 with 4G RAM, 2*400G 7200 RPM SATA drives and a ~3Ghz P4. It has an identical twin in a failover configuration via heartbeat and DRBD. The "images" and "store" shares are on different spindles. Both are running Fedora Core 4, kernel 2.6.13-1.1526_FC4smp. We are hitting a wall with the current config due, I believe, to IO contention on poor little SATA drives. While we do have medium term plans to move to something SAN-ish with GFS (or similar) and the development team has big promises of removing the need for any sort of commonly-shared-storage completely, in the interim we need to make do with NFS. We have two newer servers which I have started to configure. They are Dell PE860s with 4G RAM, 1*2.4Ghz dual-core Xeon, 1*146G 15k SAS disk (for "images") and 1* 300G 10k RPM SAS disk (for "store"). After completing a functionally identical setup (albeit using using CentOS 4 x86-64 instead of Fedora) and attaching it to our indexing and viewer nodes, we discovered it was actually _slower_ than the existing servers by a significant margin (~10 images/sec vs 17 - 19 images/sec). "17 -19 images/sec" translates roughly to a combined (store and images) speed of 20 - 25MiB/sec incoming data. Unfortunately our app isn't easily able to atuomatically simulate the "reading and sending" load, so we can't easily produce benchmarks taking into account the additional disk contention this introduces - but for the purposes of this tuning that's not really relevant. (We have tried these tests with and without DRBD active - the DRBD overhead is essentially zero, so it is not a factor). There appears to be some bottleneck in the newer servers that is making them slower. I don't believe it's hardware, which means that the CentOS+NFS combo needs some tuning. I'm not married to CentOS, although it is our preferred platform. I'm willing to try newer kernels and/or Fedora if that will improve performance. I'd rather not stray from redhat-derived distros, however, as we have a significant administrative and skills investment with them. For the purposes of benchmarking, in light of an inability to use our production systems outside of a very small window which is extremely inconvenient to the timezone I'm in, I have captured a set of sample data from the "images" and "store" shares. I am testing by copying this data around for some semblence of relevance to the real-life usage patterns. It consists of: images.tar 4889 files 684MiB store.tar 4954 files 2103MiB I have also temporarily limited the amount of memory on the NFS server to 512M, so the physical disks do actually see some action. The only "optimisations" I have made is bumping up the number of NFSDs to 96 (mirroring our current productions environments), so config and mount options are defaults (I believe this will result in TCP mounts, which wouldn't work well with our failover server config, but we're willing to accept that tradeoff for better performance). The machines are all connected to a Cisco 3750 gigabit switch which is otherwise empty. The only slightly abnormal thing is they are using dot1q trunks, ie: eth0 Link encap:Ethernet HWaddr 00:15:C5:F5:B1:0D inet6 addr: fe80::215:c5ff:fef5:b10d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11990734 errors:0 dropped:0 overruns:0 frame:0 TX packets:20114864 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1427499066 (1.3 GiB) TX bytes:29822008141 (27.7 GiB) Interrupt:169 eth0.180 Link encap:Ethernet HWaddr 00:15:C5:F5:B1:0D inet addr:10.184.2.160 Bcast:10.184.2.255 Mask:255.255.255.0 inet6 addr: fe80::215:c5ff:fef5:b10d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11947553 errors:0 dropped:0 overruns:0 frame:0 TX packets:20114877 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1161282256 (1.0 GiB) TX bytes:29661082251 (27.6 GiB) I have also tried raising the MTU, but that doesn't seem to work: [root@justinstalled ~]# ifconfig eth0 mtu 9000 SIOCSIFMTU: Invalid argument [root@justinstalled ~]# Maybe the tg3 driver doesn't support jumbo frames ? In any event, I wouldn't expect it to make the order-of-magnitude type of improvement I'm looking for. As you'll see, the performance for big, sequential writes is quite respectable, but as soon as lots of smaller files are involved, it all goes to pot. I'd very much appreciate any advice people can offer on making this faster. The numbers: Baseline performance (local disk -> local disk on NFS server). I don't expect to get this performance over NFS, but it'd be nice to get closer - especially the untarring stuff... Copy images.tar from /export/store to /export/images 684M 13s 52M/s Copy store.tar from /export/images to /export/store 2103M 33s 64M/s Untar images.tar from /export/store to /export/images time tar -C /export/images -xf /export/store/images.tar 4889 files 684M 15s 46M/s Untar store.tar from /export/images to /export/store time tar -C /export/store -xf /export/images/store.tar 4954 files 2103M 51s 41M/s Remote performance #1 (NFS client -> NFS server). Single client, single action, default NFS options: Copy images.tar time cp /data/images.tar /mnt/images 684M 18s 38M/s Copy store.tar time cp /data/store.tar /mnt/store 2103M 60s 35M/s Untar images.tar time tar -C /mnt/images -xf /data/images.tar 4889 files 684M 372s 1.8M/s Untar store.tar time tar -C /mnt/store -xf /data/store.tar 4954 files 2103M 578 3.6M/s Remote performance #2 (NFS client -> NFS server). Single client, combined action, default NFS options: Copy images.tar and store.tar simultaneously date; time cp /data/images.tar /mnt/images & time cp /data/store.tar /mnt/store && date 2787MiB 63s 44M/s Untar images.tar and store.tar simultaneously date; time tar -C /mnt/images -xf /data/images.tar & time tar -C /mnt/store -xf /data/store.tar && date 9843 files 2787MiB 576s 4.8M/s Cheers, CS -- Christopher Smith Systems Administrator Nighthawk Radiology Services Level 11, Suite 1101 Grosvenor Place 225 George Street Sydney NSW 2000 Australia T: 612 - 8211 2300 (IP: x8163) F: 612 - 8211 2333 M: 61 - 407 397 563 USA Toll free: 866 241 6635 E: csmith@nighthawkrad.net I: www.nighthawkrad.net CONFIDENTIALITY NOTICE: This email, including any attachments, contains information from NightHawk Radiology Services, which may be confidential or privileged. The information is intended to be for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this email in error, please notify NightHawk Radiology Services immediately by forwarding message to postmaster@nighthawkrad.net and destroy all electronic and hard copies of the communication, including attachments. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs