* Re: NFS tuning - high performance throughput. [not found] <482A3FA0050D21419C269D13989C611308539C89@lavender-fe.eng.netapp.com> @ 2005-06-14 20:38 ` M. Todd Smith 2005-06-15 1:56 ` Dan Stromberg 0 siblings, 1 reply; 22+ messages in thread From: M. Todd Smith @ 2005-06-14 20:38 UTC (permalink / raw) To: nfs Charles, I've tried mounting both TCP and UDP and with rwsizes varying from 8k to 32k, 32k rwsizes are giving us the best performance thus far. Indeed I did skip a level with the ping times didn't I, 200 microseconds it is. Cheers Todd Lever, Charles wrote: >first place i would look is mount options on the client. are you >mounting with NFS over TCP, or NFS over UDP? > >(btw, your ping time should be 200 microseconds, not 200 nanoseconds). > > > >>-----Original Message----- >>From: M. Todd Smith [mailto:todd@sohovfx.com] >>Sent: Tuesday, June 14, 2005 4:18 PM >>To: nfs@lists.sourceforge.net >>Subject: [NFS] NFS tuning - high performance throughput. >> >> >>Hello all, >> >>I've been attempting to get a better hold over our NFS performance as >>our network grows and grows. >> >>We've recently upgraded our NFS server machine to a dual 3.2ghz Xeon >>running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) w/ 4Gb RAM. >> Coupled >>with this machine we have a 2 Broadcom NetExtreme 2 Port PCI-X NIC on >>its own PCI-X bus (133Mhz). Attached to this machine is our fibre >>channel SAN, using Seagate fibre-channel drives and an LSI >>dual channel >>2gigabit fibre adapter on its own PCI-X bus. Local RW is >>~135Mbytes/sec. >> >>The 4 ports are trunked together using bonding (balanced round robin >>mode), and trunked together on our Extreme Networks Summit >>400i switch. >>All the test machines are attached to this switch, making everything >>within one hop and ping times of less than a 200 nanoseconds. >> >>Current test machine is running Suse 9.2 and has an Intel 100/1000 XT >>server adapter (e1000 driver) on a shared but not high >>traffic PCI bus. >>Other test machines include some Dell PowerEdge 1850's with onboard >>Intel NICs, and some Apple G4, G5 and Xserves. >> >>I have read most of the tuning guides I can find on the net and >>attempted just about everything I can get my hands on (I have >>not tried >>jumbo frames yet, still waiting for some downtime to attempt >>that). My >>problem is that no matter how I tune the machines I can get at max >>45Mb/ps throughput on NFS. This was the same throughput we >>were getting >>with our old server with PCI cards, moreover this throughput >>is roughly >>the same for every machine on our network. Theoretically we >>should be >>able to get much higher values. >> >>Any idea as to why this is? I can provide config files and such if >>needed, but I'm really at a loss as to where to start. >> >>Cheers >>Todd >> >>-- >>Systems Administrator >>---------------------------------- >>Soho VFX - Visual Effects Studio >>99 Atlantic Avenue, Suite 303 >>Toronto, Ontario, M6K 3J8 >>(416) 516-7863 >>http://www.sohovfx.com >>---------------------------------- >> >> >> >> >> >> >>------------------------------------------------------- >>SF.Net email is sponsored by: Discover Easy Linux Migration Strategies >>from IBM. Find simple to follow Roadmaps, straightforward articles, >>informative Webcasts and more! Get everything you need to get up to >>speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click >>_______________________________________________ >>NFS maillist - NFS@lists.sourceforge.net >>https://lists.sourceforge.net/lists/listinfo/nfs >> >> >> > > > > -- Systems Administrator ---------------------------------- Soho VFX - Visual Effects Studio 99 Atlantic Avenue, Suite 303 Toronto, Ontario, M6K 3J8 (416) 516-7863 http://www.sohovfx.com ---------------------------------- ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 20:38 ` NFS tuning - high performance throughput M. Todd Smith @ 2005-06-15 1:56 ` Dan Stromberg 0 siblings, 0 replies; 22+ messages in thread From: Dan Stromberg @ 2005-06-15 1:56 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs, strombrg [-- Attachment #1: Type: text/plain, Size: 3896 bytes --] Sorry, I'm coming to the discussion late, but perhaps one of these URL's will help? http://dcs.nac.uci.edu/~strombrg/NFS-troubleshooting-2.html http://dcs.nac.uci.edu/~strombrg/network-performance.html On Tue, 2005-06-14 at 16:38 -0400, M. Todd Smith wrote: > Charles, > > I've tried mounting both TCP and UDP and with rwsizes varying from 8k to > 32k, 32k rwsizes are giving us the best performance thus far. > > Indeed I did skip a level with the ping times didn't I, 200 microseconds > it is. > > Cheers > Todd > > > Lever, Charles wrote: > > >first place i would look is mount options on the client. are you > >mounting with NFS over TCP, or NFS over UDP? > > > >(btw, your ping time should be 200 microseconds, not 200 nanoseconds). > > > > > > > >>-----Original Message----- > >>From: M. Todd Smith [mailto:todd@sohovfx.com] > >>Sent: Tuesday, June 14, 2005 4:18 PM > >>To: nfs@lists.sourceforge.net > >>Subject: [NFS] NFS tuning - high performance throughput. > >> > >> > >>Hello all, > >> > >>I've been attempting to get a better hold over our NFS performance as > >>our network grows and grows. > >> > >>We've recently upgraded our NFS server machine to a dual 3.2ghz Xeon > >>running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) w/ 4Gb RAM. > >> Coupled > >>with this machine we have a 2 Broadcom NetExtreme 2 Port PCI-X NIC on > >>its own PCI-X bus (133Mhz). Attached to this machine is our fibre > >>channel SAN, using Seagate fibre-channel drives and an LSI > >>dual channel > >>2gigabit fibre adapter on its own PCI-X bus. Local RW is > >>~135Mbytes/sec. > >> > >>The 4 ports are trunked together using bonding (balanced round robin > >>mode), and trunked together on our Extreme Networks Summit > >>400i switch. > >>All the test machines are attached to this switch, making everything > >>within one hop and ping times of less than a 200 nanoseconds. > >> > >>Current test machine is running Suse 9.2 and has an Intel 100/1000 XT > >>server adapter (e1000 driver) on a shared but not high > >>traffic PCI bus. > >>Other test machines include some Dell PowerEdge 1850's with onboard > >>Intel NICs, and some Apple G4, G5 and Xserves. > >> > >>I have read most of the tuning guides I can find on the net and > >>attempted just about everything I can get my hands on (I have > >>not tried > >>jumbo frames yet, still waiting for some downtime to attempt > >>that). My > >>problem is that no matter how I tune the machines I can get at max > >>45Mb/ps throughput on NFS. This was the same throughput we > >>were getting > >>with our old server with PCI cards, moreover this throughput > >>is roughly > >>the same for every machine on our network. Theoretically we > >>should be > >>able to get much higher values. > >> > >>Any idea as to why this is? I can provide config files and such if > >>needed, but I'm really at a loss as to where to start. > >> > >>Cheers > >>Todd > >> > >>-- > >>Systems Administrator > >>---------------------------------- > >>Soho VFX - Visual Effects Studio > >>99 Atlantic Avenue, Suite 303 > >>Toronto, Ontario, M6K 3J8 > >>(416) 516-7863 > >>http://www.sohovfx.com > >>---------------------------------- > >> > >> > >> > >> > >> > >> > >>------------------------------------------------------- > >>SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > >>from IBM. Find simple to follow Roadmaps, straightforward articles, > >>informative Webcasts and more! Get everything you need to get up to > >>speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > >>_______________________________________________ > >>NFS maillist - NFS@lists.sourceforge.net > >>https://lists.sourceforge.net/lists/listinfo/nfs > >> > >> > >> > > > > > > > > > > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. @ 2005-06-14 20:40 Lever, Charles 0 siblings, 0 replies; 22+ messages in thread From: Lever, Charles @ 2005-06-14 20:40 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs have you run any network tests, like iperf or ttcp, to determine if you are getting adequate network throughput? what are the export options you're using on the server? > -----Original Message----- > From: M. Todd Smith [mailto:todd@sohovfx.com]=20 > Sent: Tuesday, June 14, 2005 4:38 PM > To: nfs@lists.sourceforge.net > Subject: Re: [NFS] NFS tuning - high performance throughput. >=20 >=20 > Charles, >=20 > I've tried mounting both TCP and UDP and with rwsizes varying=20 > from 8k to=20 > 32k, 32k rwsizes are giving us the best performance thus far. >=20 > Indeed I did skip a level with the ping times didn't I, 200=20 > microseconds=20 > it is. >=20 > Cheers > Todd >=20 >=20 > Lever, Charles wrote: >=20 > >first place i would look is mount options on the client. are you > >mounting with NFS over TCP, or NFS over UDP? > > > >(btw, your ping time should be 200 microseconds, not 200=20 > nanoseconds). > > > > =20 > > > >>-----Original Message----- > >>From: M. Todd Smith [mailto:todd@sohovfx.com]=20 > >>Sent: Tuesday, June 14, 2005 4:18 PM > >>To: nfs@lists.sourceforge.net > >>Subject: [NFS] NFS tuning - high performance throughput. > >> > >> > >>Hello all, > >> > >>I've been attempting to get a better hold over our NFS=20 > performance as=20 > >>our network grows and grows. > >> > >>We've recently upgraded our NFS server machine to a dual=20 > 3.2ghz Xeon=20 > >>running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) w/ 4Gb RAM.=20 > >> Coupled=20 > >>with this machine we have a 2 Broadcom NetExtreme 2 Port=20 > PCI-X NIC on=20 > >>its own PCI-X bus (133Mhz). Attached to this machine is our fibre=20 > >>channel SAN, using Seagate fibre-channel drives and an LSI=20 > >>dual channel=20 > >>2gigabit fibre adapter on its own PCI-X bus. Local RW is=20 > >>~135Mbytes/sec. > >>=20 > >>The 4 ports are trunked together using bonding (balanced=20 > round robin=20 > >>mode), and trunked together on our Extreme Networks Summit=20 > >>400i switch. =20 > >>All the test machines are attached to this switch, making=20 > everything=20 > >>within one hop and ping times of less than a 200 nanoseconds. > >> > >>Current test machine is running Suse 9.2 and has an Intel=20 > 100/1000 XT=20 > >>server adapter (e1000 driver) on a shared but not high=20 > >>traffic PCI bus. =20 > >>Other test machines include some Dell PowerEdge 1850's with onboard=20 > >>Intel NICs, and some Apple G4, G5 and Xserves. > >> > >>I have read most of the tuning guides I can find on the net and=20 > >>attempted just about everything I can get my hands on (I have=20 > >>not tried=20 > >>jumbo frames yet, still waiting for some downtime to attempt=20 > >>that). My=20 > >>problem is that no matter how I tune the machines I can get at max=20 > >>45Mb/ps throughput on NFS. This was the same throughput we=20 > >>were getting=20 > >>with our old server with PCI cards, moreover this throughput=20 > >>is roughly=20 > >>the same for every machine on our network. Theoretically we=20 > >>should be=20 > >>able to get much higher values. > >> > >>Any idea as to why this is? I can provide config files and such if=20 > >>needed, but I'm really at a loss as to where to start. > >> > >>Cheers > >>Todd > >> > >>--=20 > >>Systems Administrator > >>---------------------------------- > >>Soho VFX - Visual Effects Studio > >>99 Atlantic Avenue, Suite 303 > >>Toronto, Ontario, M6K 3J8 > >>(416) 516-7863=20 > >>http://www.sohovfx.com > >>---------------------------------- > >> > >> > >>=20 > >> > >> > >> > >>------------------------------------------------------- > >>SF.Net email is sponsored by: Discover Easy Linux Migration=20 > Strategies > >>from IBM. Find simple to follow Roadmaps, straightforward articles, > >>informative Webcasts and more! Get everything you need to get up to > >>speed, fast. = http://ads.osdn.com/?ad_id=3D7477&alloc_id=3D16492&op=3Dclick > >>_______________________________________________ > >>NFS maillist - NFS@lists.sourceforge.net > >>https://lists.sourceforge.net/lists/listinfo/nfs > >> > >> =20 > >> > > > > > > =20 > > >=20 >=20 > --=20 > Systems Administrator > ---------------------------------- > Soho VFX - Visual Effects Studio > 99 Atlantic Avenue, Suite 303 > Toronto, Ontario, M6K 3J8 > (416) 516-7863=20 > http://www.sohovfx.com > ---------------------------------- >=20 >=20 > =20 >=20 >=20 >=20 > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. = http://ads.osdn.com/?ad_id=3D7477&alloc_id=3D16492&op=3Dclick > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs >=20 ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20050610031144.4B9CA12F8C@sc8-sf-spam2.sourceforge.net>]
* NFS tuning - high performance throughput. [not found] <20050610031144.4B9CA12F8C@sc8-sf-spam2.sourceforge.net> @ 2005-06-14 20:17 ` M. Todd Smith 2005-06-14 20:41 ` Bill Rugolsky Jr. ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: M. Todd Smith @ 2005-06-14 20:17 UTC (permalink / raw) To: nfs Hello all, I've been attempting to get a better hold over our NFS performance as our network grows and grows. We've recently upgraded our NFS server machine to a dual 3.2ghz Xeon running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) w/ 4Gb RAM. Coupled with this machine we have a 2 Broadcom NetExtreme 2 Port PCI-X NIC on its own PCI-X bus (133Mhz). Attached to this machine is our fibre channel SAN, using Seagate fibre-channel drives and an LSI dual channel 2gigabit fibre adapter on its own PCI-X bus. Local RW is ~135Mbytes/sec. The 4 ports are trunked together using bonding (balanced round robin mode), and trunked together on our Extreme Networks Summit 400i switch. All the test machines are attached to this switch, making everything within one hop and ping times of less than a 200 nanoseconds. Current test machine is running Suse 9.2 and has an Intel 100/1000 XT server adapter (e1000 driver) on a shared but not high traffic PCI bus. Other test machines include some Dell PowerEdge 1850's with onboard Intel NICs, and some Apple G4, G5 and Xserves. I have read most of the tuning guides I can find on the net and attempted just about everything I can get my hands on (I have not tried jumbo frames yet, still waiting for some downtime to attempt that). My problem is that no matter how I tune the machines I can get at max 45Mb/ps throughput on NFS. This was the same throughput we were getting with our old server with PCI cards, moreover this throughput is roughly the same for every machine on our network. Theoretically we should be able to get much higher values. Any idea as to why this is? I can provide config files and such if needed, but I'm really at a loss as to where to start. Cheers Todd -- Systems Administrator ---------------------------------- Soho VFX - Visual Effects Studio 99 Atlantic Avenue, Suite 303 Toronto, Ontario, M6K 3J8 (416) 516-7863 http://www.sohovfx.com ---------------------------------- ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 20:17 ` M. Todd Smith @ 2005-06-14 20:41 ` Bill Rugolsky Jr. 2005-06-14 22:49 ` M. Todd Smith 2005-06-14 20:50 ` Bill Rugolsky Jr. ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Bill Rugolsky Jr. @ 2005-06-14 20:41 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs On Tue, Jun 14, 2005 at 04:17:48PM -0400, M. Todd Smith wrote: > We've recently upgraded our NFS server machine to a dual 3.2ghz Xeon > running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) w/ 4Gb RAM. Coupled > with this machine we have a 2 Broadcom NetExtreme 2 Port PCI-X NIC on > its own PCI-X bus (133Mhz). Attached to this machine is our fibre > channel SAN, using Seagate fibre-channel drives and an LSI dual channel > 2gigabit fibre adapter on its own PCI-X bus. Local RW is ~135Mbytes/sec. ... > I have read most of the tuning guides I can find on the net and > attempted just about everything I can get my hands on (I have not tried > jumbo frames yet, still waiting for some downtime to attempt that). My > problem is that no matter how I tune the machines I can get at max > 45Mb/ps throughput on NFS. This was the same throughput we were getting > with our old server with PCI cards, moreover this throughput is roughly > the same for every machine on our network. Theoretically we should be > able to get much higher values. I assume that you mean 45 MiB/s? Reading or writing? What are you using for testing? What are the file sizes? Have you validated network throughput using ttcp or netperf? You say that you've read the tuning guides, but you haven't told us what you have touched. Please tell us: o client-side NFS mount options o RAID configuration (level, stripe size, etc.) o I/O scheduler o queue depths (/sys/block/*/queue/nr_requests) o readahead (/sbin/blockdev --getra <device>) o mount options (e.g., are you using noatime) o filesystem type o journaling mode, if Ext3 or Reiserfs o journal size o internal or external journal o vm tunables: vm.dirty_writeback_centisecs vm.dirty_expire_centisecs vm.dirty_ratio vm.dirty_background_ratio vm.nr_pdflush_threads vm.vfs_cache_pressure Regards, Bill Rugolsky ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 20:41 ` Bill Rugolsky Jr. @ 2005-06-14 22:49 ` M. Todd Smith 2005-06-15 13:03 ` Roger Heflin 2005-06-15 17:47 ` Bill Rugolsky Jr. 0 siblings, 2 replies; 22+ messages in thread From: M. Todd Smith @ 2005-06-14 22:49 UTC (permalink / raw) To: nfs First off thanks for the overwhelming response. I'll start with Bill's response, fill in any holes after that. Bill Rugolsky Jr. wrote: >I assume that you mean 45 MiB/s? Reading or writing? What are you >using for testing? What are the file sizes? > > I'm not sure what a MiB/s is. I've been using the following for testing writes. time dd if=/dev/zero of=/mnt/array1/testfile5G.001 bs=512k count=10240 which writes a 5Gb file to the mounted NFS volume, I've then been taking the times thrown back once that finishes and calculating the megabytes/second, and averaging over ten seperate tests unmounting and remounting the volume after each test. For reads I cat the file back to /dev/null time cat /mnt/array1/testfile5G.001 >> /dev/null Read times are better, but not optimal either usually sitting around ~ 70Mbytes/sec. > >Have you validated network throughput using ttcp or netperf? > > We did at one point validate newtork throughput with ttcp, although I have yet to find a definite guide to using ttcp, here is some output. sender: ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001 ttcp-t: sockbufsize=65535, # udp -> test_sweet # ttcp-t: 16777216 bytes in 0.141 real seconds = 116351.241 KB/sec +++ ttcp-t: 2054 I/O calls, msec/call = 0.070, calls/sec = 14586.514 ttcp-t: 0.000user 0.050sys 0:00real 35% 0i+0d 0maxrss 0+2pf 0+0csw receiver: ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 ttcp-r: sockbufsize=65536, # udp # ttcp-r: 16777216 bytes in 0.141 real seconds = 115970.752 KB/sec +++ ttcp-r: 2050 I/O calls, msec/call = 0.071, calls/sec = 14510.501 ttcp-r: 0.000user 0.059sys 0:00real 35% 0i+0d 0maxrss 0+1pf 2017+18csw >You say that you've read the tuning guides, but you haven't told us what >you have touched. Please tell us: > > o client-side NFS mount options > > exec,dev,suid,rw,rsize=32768,wsize=32768,timeo=500,retrans=10,retry=60,bg 1 0 > o RAID configuration (level, stripe size, etc.) > > RAID 5, 4k strip size, XFS file system. meta-data=/array1 isize=256 agcount=32, agsize=13302572 blks = sectsz=512 data = bsize=4096 blocks=425682304, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 > o I/O scheduler > > Not sure what you mean here. > o queue depths (/sys/block/*/queue/nr_requests) > > 1024 > o readahead (/sbin/blockdev --getra <device>) > > 256 > o mount options (e.g., are you using noatime) > > /array1 xfs logbufs=8,noatime,nodiratime > o filesystem type > > XFS > o journaling mode, if Ext3 or Reiserfs > > > o journal size > > o internal or external journal > > log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks > o vm tunables: > > vm.dirty_writeback_centisecs > vm.dirty_expire_centisecs > vm.dirty_ratio > vm.dirty_background_ratio > vm.nr_pdflush_threads > vm.vfs_cache_pressure > > vm.vfs_cache_pressure = 100 vm.nr_pdflush_threads = 2 vm.dirty_expire_centisecs = 3000 vm.dirty_writeback_centisecs = 500 vm.dirty_ratio = 29 vm.dirty_background_ratio = 7 The SAN layout is as follows I did not set this part up and have had little time to catch up on it so far. We initially attempted to have this setup such that we would stripe across both arrays but had some problems and due to time constraints on having the new system in place had to go back to the two array method. Just went and had a look, I'm not sure it all makes sense to me yet. ---------------------- 2*parity drives 2*spare drives ---------------------- | | | | (2 FC conns) ---------------------- ARRAY 1 ---------------------- | | | | ---------------------- ARRAY 2 ---------------------- | | | | ---------------------- FC controller card ----------------------- | | | | ----------------------- FC card on server ----------------------- Not sure why the connections are chained all the way through the system like that, I'll have to ask our hardware vendor why its setup that way. Theoretically the throughput to/from this SAN should be more in the range of 300-400Mb/s. Haven't had a chance to do any testing with that though. Using 256 NFS threads on the server, and the following sysctl settings. net.ipv4.tcp_mem = 196608 262144 393216 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_rmem = 4096 87380 8388608 net.core.rmem_default = 65536 net.core.rmem_max = 8388608 net.core.wmem_default = 65536 net.core.wmem_max = 8388608 Have hyperthreading turned off. Also if anyone can recommend some good NFS reference material, I'd love to get my hands on it. Cheers Todd -- Systems Administrator ---------------------------------- Soho VFX - Visual Effects Studio 99 Atlantic Avenue, Suite 303 Toronto, Ontario, M6K 3J8 (416) 516-7863 http://www.sohovfx.com ---------------------------------- ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-14 22:49 ` M. Todd Smith @ 2005-06-15 13:03 ` Roger Heflin 2005-06-15 14:47 ` M. Todd Smith 2005-06-15 17:47 ` Bill Rugolsky Jr. 1 sibling, 1 reply; 22+ messages in thread From: Roger Heflin @ 2005-06-15 13:03 UTC (permalink / raw) To: 'M. Todd Smith', nfs Are you using the same dd test on the local machine test? If so cache will be a major factor. Also, raid5 stripe size, bigger is almost always better, I would do some testing with different strip sizes and see how it affects the speed, I have never seen less than 32k be faster than 32k. Are you using md for the raid5 setup or something else that has not been mentioned? Roger > -----Original Message----- > From: nfs-admin@lists.sourceforge.net > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. Todd Smith > Sent: Tuesday, June 14, 2005 5:50 PM > To: nfs@lists.sourceforge.net > Subject: Re: [NFS] NFS tuning - high performance throughput. > > First off thanks for the overwhelming response. I'll start > with Bill's response, fill in any holes after that. > > Bill Rugolsky Jr. wrote: > > >I assume that you mean 45 MiB/s? Reading or writing? What are you > >using for testing? What are the file sizes? > > > > > I'm not sure what a MiB/s is. I've been using the following > for testing writes. > > time dd if=/dev/zero of=/mnt/array1/testfile5G.001 bs=512k count=10240 > > which writes a 5Gb file to the mounted NFS volume, I've then > been taking the times thrown back once that finishes and > calculating the megabytes/second, and averaging over ten > seperate tests unmounting and remounting the volume after each test. > > For reads I cat the file back to /dev/null > > time cat /mnt/array1/testfile5G.001 >> /dev/null > > Read times are better, but not optimal either usually sitting > around ~ 70Mbytes/sec. > > > > >Have you validated network throughput using ttcp or netperf? > > > > > We did at one point validate newtork throughput with ttcp, > although I have yet to find a definite guide to using ttcp, > here is some output. > > sender: > ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001 > ttcp-t: sockbufsize=65535, # udp -> test_sweet # > ttcp-t: 16777216 bytes in 0.141 real seconds = 116351.241 KB/sec +++ > ttcp-t: 2054 I/O calls, msec/call = 0.070, calls/sec = 14586.514 > ttcp-t: 0.000user 0.050sys 0:00real 35% 0i+0d 0maxrss 0+2pf 0+0csw > > receiver: > ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 > ttcp-r: sockbufsize=65536, # udp # > ttcp-r: 16777216 bytes in 0.141 real seconds = 115970.752 KB/sec +++ > ttcp-r: 2050 I/O calls, msec/call = 0.071, calls/sec = 14510.501 > ttcp-r: 0.000user 0.059sys 0:00real 35% 0i+0d 0maxrss 0+1pf 2017+18csw > > >You say that you've read the tuning guides, but you haven't told us > >what you have touched. Please tell us: > > > > o client-side NFS mount options > > > > > exec,dev,suid,rw,rsize=32768,wsize=32768,timeo=500,retrans=10, > retry=60,bg > 1 0 > > > o RAID configuration (level, stripe size, etc.) > > > > > > RAID 5, 4k strip size, XFS file system. > > meta-data=/array1 isize=256 agcount=32, > agsize=13302572 > blks > = sectsz=512 > data = bsize=4096 > blocks=425682304, > imaxpct=25 > = sunit=0 swidth=0 blks, > unwritten=1 > naming =version 2 bsize=4096 > log =internal bsize=4096 > blocks=32768, version=1 > = sectsz=512 sunit=0 blks > realtime =none extsz=65536 blocks=0, rtextents=0 > > > o I/O scheduler > > > > > Not sure what you mean here. > > > o queue depths (/sys/block/*/queue/nr_requests) > > > > > 1024 > > > o readahead (/sbin/blockdev --getra <device>) > > > > > 256 > > > o mount options (e.g., are you using noatime) > > > > > /array1 xfs logbufs=8,noatime,nodiratime > > > o filesystem type > > > > > XFS > > > o journaling mode, if Ext3 or Reiserfs > > > > > > o journal size > > > > o internal or external journal > > > > > log =internal bsize=4096 > blocks=32768, version=1 > = sectsz=512 sunit=0 blks > > > o vm tunables: > > > > vm.dirty_writeback_centisecs > > vm.dirty_expire_centisecs > > vm.dirty_ratio > > vm.dirty_background_ratio > > vm.nr_pdflush_threads > > vm.vfs_cache_pressure > > > > > vm.vfs_cache_pressure = 100 > vm.nr_pdflush_threads = 2 > vm.dirty_expire_centisecs = 3000 > vm.dirty_writeback_centisecs = 500 > vm.dirty_ratio = 29 > vm.dirty_background_ratio = 7 > > The SAN layout is as follows > > I did not set this part up and have had little time to catch > up on it so far. We initially attempted to have this setup > such that we would stripe across both arrays but had some > problems and due to time constraints on having the new system > in place had to go back to the two array method. > > Just went and had a look, I'm not sure it all makes sense to me yet. > > ---------------------- > 2*parity drives > 2*spare drives > ---------------------- > | | | | (2 FC conns) > ---------------------- > ARRAY 1 > ---------------------- > | | | | > ---------------------- > ARRAY 2 > ---------------------- > | | | | > ---------------------- > FC controller card > ----------------------- > | | | | > ----------------------- > FC card on server > ----------------------- > > Not sure why the connections are chained all the way through > the system like that, I'll have to ask our hardware vendor > why its setup that way. > Theoretically the throughput to/from this SAN should be more > in the range of 300-400Mb/s. Haven't had a chance to do any > testing with that though. > > Using 256 NFS threads on the server, and the following sysctl > settings. > net.ipv4.tcp_mem = 196608 262144 393216 > net.ipv4.tcp_wmem = 4096 65536 8388608 > net.ipv4.tcp_rmem = 4096 87380 8388608 > net.core.rmem_default = 65536 > net.core.rmem_max = 8388608 > net.core.wmem_default = 65536 > net.core.wmem_max = 8388608 > > Have hyperthreading turned off. > > Also if anyone can recommend some good NFS reference > material, I'd love to get my hands on it. > > Cheers > Todd > > -- > Systems Administrator > ---------------------------------- > Soho VFX - Visual Effects Studio > 99 Atlantic Avenue, Suite 303 > Toronto, Ontario, M6K 3J8 > (416) 516-7863 > http://www.sohovfx.com > ---------------------------------- > > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration > Strategies from IBM. Find simple to follow Roadmaps, > straightforward articles, informative Webcasts and more! Get > everything you need to get up to speed, fast. > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-15 13:03 ` Roger Heflin @ 2005-06-15 14:47 ` M. Todd Smith 2005-06-15 15:28 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: M. Todd Smith @ 2005-06-15 14:47 UTC (permalink / raw) To: nfs Roger Heflin wrote: >Are you using the same dd test on the local machine test? If so >cache will be a major factor. > >Also, raid5 stripe size, bigger is almost always better, I would >do some testing with different strip sizes and see how it affects >the speed, I have never seen less than 32k be faster than 32k. > >Are you using md for the raid5 setup or something else that has >not been mentioned? > > Roger > > Roger, I was told by a consultant we hired to help us with this that 5Gb test files should be large enough to blow out the caches. Looking back I realize that must have been when we only had 2Gb RAM in the machine .. is there a better test I can use to check local rw performance, as well is there a formula or recommended way to calculate how big a file you should rw to properly check local rw? I'll have to ask our soft eng who worked with the consultant as to why that particular stripe size was used. I believe we are using md, we attempted to use LVM but ran into some problems and had some file inconsistencies that were unacceptable so had to back out of it. Cheers Todd -- Systems Administrator ---------------------------------- Soho VFX - Visual Effects Studio 99 Atlantic Avenue, Suite 303 Toronto, Ontario, M6K 3J8 (416) 516-7863 http://www.sohovfx.com ---------------------------------- ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-15 14:47 ` M. Todd Smith @ 2005-06-15 15:28 ` Roger Heflin 2005-06-15 19:13 ` Dan Stromberg 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2005-06-15 15:28 UTC (permalink / raw) To: 'M. Todd Smith', nfs Make the dd test bigger, with 2x the memory size, 50% will be from cache and almost instanteous, with 4x the memory size, 25% will be from cache, bigger runs will get closer to actual reality, on the tests I do I run things 8x, but if you run 2x, 4x, 8x memory you should get a decent graph to give you an idea of what the actual value is. Roger Atipa Technologies. > -----Original Message----- > From: nfs-admin@lists.sourceforge.net > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. Todd Smith > Sent: Wednesday, June 15, 2005 9:47 AM > To: nfs@lists.sourceforge.net > Subject: Re: [NFS] NFS tuning - high performance throughput. > > Roger Heflin wrote: > > >Are you using the same dd test on the local machine test? > If so cache > >will be a major factor. > > > >Also, raid5 stripe size, bigger is almost always better, I would do > >some testing with different strip sizes and see how it affects the > >speed, I have never seen less than 32k be faster than 32k. > > > >Are you using md for the raid5 setup or something else that has not > >been mentioned? > > > > Roger > > > > > Roger, > > I was told by a consultant we hired to help us with this that > 5Gb test files should be large enough to blow out the caches. > Looking back I realize that must have been when we only had > 2Gb RAM in the machine .. > is there a better test I can use to check local rw > performance, as well is there a formula or recommended way to > calculate how big a file you should rw to properly check local rw? > > I'll have to ask our soft eng who worked with the consultant > as to why that particular stripe size was used. I believe we > are using md, we attempted to use LVM but ran into some > problems and had some file inconsistencies that were > unacceptable so had to back out of it. > > Cheers > Todd > > -- > Systems Administrator > ---------------------------------- > Soho VFX - Visual Effects Studio > 99 Atlantic Avenue, Suite 303 > Toronto, Ontario, M6K 3J8 > (416) 516-7863 > http://www.sohovfx.com > ---------------------------------- > > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration > Strategies from IBM. Find simple to follow Roadmaps, > straightforward articles, informative Webcasts and more! Get > everything you need to get up to speed, fast. > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-15 15:28 ` Roger Heflin @ 2005-06-15 19:13 ` Dan Stromberg 2005-06-15 19:52 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: Dan Stromberg @ 2005-06-15 19:13 UTC (permalink / raw) To: Roger Heflin; +Cc: 'M. Todd Smith', nfs, strombrg [-- Attachment #1: Type: text/plain, Size: 3698 bytes --] A bigger dd test should work - if you use about 3x your physical memory, you'll probably be fine doing it that way. Another way that may work better in -some- cases, is to umount the filesystem and re-mount it - except for warm restartable filesystems. On Wed, 2005-06-15 at 10:28 -0500, Roger Heflin wrote: > Make the dd test bigger, with 2x the memory size, 50% will > be from cache and almost instanteous, with 4x the memory size, > 25% will be from cache, bigger runs will get closer to actual > reality, on the tests I do I run things 8x, but if you run > 2x, 4x, 8x memory you should get a decent graph to give you > an idea of what the actual value is. > > Roger > Atipa Technologies. > > > -----Original Message----- > > From: nfs-admin@lists.sourceforge.net > > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. Todd Smith > > Sent: Wednesday, June 15, 2005 9:47 AM > > To: nfs@lists.sourceforge.net > > Subject: Re: [NFS] NFS tuning - high performance throughput. > > > > Roger Heflin wrote: > > > > >Are you using the same dd test on the local machine test? > > If so cache > > >will be a major factor. > > > > > >Also, raid5 stripe size, bigger is almost always better, I would do > > >some testing with different strip sizes and see how it affects the > > >speed, I have never seen less than 32k be faster than 32k. > > > > > >Are you using md for the raid5 setup or something else that has not > > >been mentioned? > > > > > > Roger > > > > > > > > Roger, > > > > I was told by a consultant we hired to help us with this that > > 5Gb test files should be large enough to blow out the caches. > > Looking back I realize that must have been when we only had > > 2Gb RAM in the machine .. > > is there a better test I can use to check local rw > > performance, as well is there a formula or recommended way to > > calculate how big a file you should rw to properly check local rw? > > > > I'll have to ask our soft eng who worked with the consultant > > as to why that particular stripe size was used. I believe we > > are using md, we attempted to use LVM but ran into some > > problems and had some file inconsistencies that were > > unacceptable so had to back out of it. > > > > Cheers > > Todd > > > > -- > > Systems Administrator > > ---------------------------------- > > Soho VFX - Visual Effects Studio > > 99 Atlantic Avenue, Suite 303 > > Toronto, Ontario, M6K 3J8 > > (416) 516-7863 > > http://www.sohovfx.com > > ---------------------------------- > > > > > > > > > > > > > > ------------------------------------------------------- > > SF.Net email is sponsored by: Discover Easy Linux Migration > > Strategies from IBM. Find simple to follow Roadmaps, > > straightforward articles, informative Webcasts and more! Get > > everything you need to get up to speed, fast. > > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > _______________________________________________ > > NFS maillist - NFS@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-15 19:13 ` Dan Stromberg @ 2005-06-15 19:52 ` Roger Heflin 2005-06-15 20:11 ` Dan Stromberg 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2005-06-15 19:52 UTC (permalink / raw) To: 'Dan Stromberg'; +Cc: 'M. Todd Smith', nfs The mount/remount (unless you time how long it takes to unmount), will potentially still skew the write test, and the writes will still be happening for quite a while after the dd completes. Roger > -----Original Message----- > From: Dan Stromberg [mailto:strombrg@dcs.nac.uci.edu] > Sent: Wednesday, June 15, 2005 2:14 PM > To: Roger Heflin > Cc: 'M. Todd Smith'; nfs@lists.sourceforge.net; > strombrg@dcs.nac.uci.edu > Subject: RE: [NFS] NFS tuning - high performance throughput. > > > A bigger dd test should work - if you use about 3x your > physical memory, you'll probably be fine doing it that way. > > Another way that may work better in -some- cases, is to > umount the filesystem and re-mount it - except for warm > restartable filesystems. > > On Wed, 2005-06-15 at 10:28 -0500, Roger Heflin wrote: > > Make the dd test bigger, with 2x the memory size, 50% will be from > > cache and almost instanteous, with 4x the memory size, 25% will be > > from cache, bigger runs will get closer to actual reality, on the > > tests I do I run things 8x, but if you run 2x, 4x, 8x memory you > > should get a decent graph to give you an idea of what the > actual value > > is. > > > > Roger > > Atipa Technologies. > > > > > -----Original Message----- > > > From: nfs-admin@lists.sourceforge.net > > > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. > Todd Smith > > > Sent: Wednesday, June 15, 2005 9:47 AM > > > To: nfs@lists.sourceforge.net > > > Subject: Re: [NFS] NFS tuning - high performance throughput. > > > > > > Roger Heflin wrote: > > > > > > >Are you using the same dd test on the local machine test? > > > If so cache > > > >will be a major factor. > > > > > > > >Also, raid5 stripe size, bigger is almost always better, > I would do > > > >some testing with different strip sizes and see how it > affects the > > > >speed, I have never seen less than 32k be faster than 32k. > > > > > > > >Are you using md for the raid5 setup or something else > that has not > > > >been mentioned? > > > > > > > > Roger > > > > > > > > > > > Roger, > > > > > > I was told by a consultant we hired to help us with this that 5Gb > > > test files should be large enough to blow out the caches. > > > Looking back I realize that must have been when we only > had 2Gb RAM > > > in the machine .. > > > is there a better test I can use to check local rw > performance, as > > > well is there a formula or recommended way to calculate how big a > > > file you should rw to properly check local rw? > > > > > > I'll have to ask our soft eng who worked with the > consultant as to > > > why that particular stripe size was used. I believe we are using > > > md, we attempted to use LVM but ran into some problems > and had some > > > file inconsistencies that were unacceptable so had to back out of > > > it. > > > > > > Cheers > > > Todd > > > > > > -- > > > Systems Administrator > > > ---------------------------------- > > > Soho VFX - Visual Effects Studio > > > 99 Atlantic Avenue, Suite 303 > > > Toronto, Ontario, M6K 3J8 > > > (416) 516-7863 > > > http://www.sohovfx.com > > > ---------------------------------- > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > SF.Net email is sponsored by: Discover Easy Linux Migration > > > Strategies from IBM. Find simple to follow Roadmaps, > straightforward > > > articles, informative Webcasts and more! Get everything > you need to > > > get up to speed, fast. > > > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > > _______________________________________________ > > > NFS maillist - NFS@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > > > > > > > > ------------------------------------------------------- > > SF.Net email is sponsored by: Discover Easy Linux Migration > Strategies > > from IBM. Find simple to follow Roadmaps, straightforward articles, > > informative Webcasts and more! Get everything you need to get up to > > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > _______________________________________________ > > NFS maillist - NFS@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs > > > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-15 19:52 ` Roger Heflin @ 2005-06-15 20:11 ` Dan Stromberg 2005-06-15 20:31 ` Roger Heflin 2005-06-15 20:33 ` Chris Penney 0 siblings, 2 replies; 22+ messages in thread From: Dan Stromberg @ 2005-06-15 20:11 UTC (permalink / raw) To: Roger Heflin; +Cc: 'M. Todd Smith', nfs, strombrg [-- Attachment #1: Type: text/plain, Size: 4851 bytes --] On Wed, 2005-06-15 at 14:52 -0500, Roger Heflin wrote: > The mount/remount (unless you time how long it takes to unmount), > will potentially still skew the write test, and the writes > will still be happening for quite a while after the dd completes. Or unless you sync twice? Or write synchronously. :) > Roger > > > -----Original Message----- > > From: Dan Stromberg [mailto:strombrg@dcs.nac.uci.edu] > > Sent: Wednesday, June 15, 2005 2:14 PM > > To: Roger Heflin > > Cc: 'M. Todd Smith'; nfs@lists.sourceforge.net; > > strombrg@dcs.nac.uci.edu > > Subject: RE: [NFS] NFS tuning - high performance throughput. > > > > > > A bigger dd test should work - if you use about 3x your > > physical memory, you'll probably be fine doing it that way. > > > > Another way that may work better in -some- cases, is to > > umount the filesystem and re-mount it - except for warm > > restartable filesystems. > > > > On Wed, 2005-06-15 at 10:28 -0500, Roger Heflin wrote: > > > Make the dd test bigger, with 2x the memory size, 50% will be from > > > cache and almost instanteous, with 4x the memory size, 25% will be > > > from cache, bigger runs will get closer to actual reality, on the > > > tests I do I run things 8x, but if you run 2x, 4x, 8x memory you > > > should get a decent graph to give you an idea of what the > > actual value > > > is. > > > > > > Roger > > > Atipa Technologies. > > > > > > > -----Original Message----- > > > > From: nfs-admin@lists.sourceforge.net > > > > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. > > Todd Smith > > > > Sent: Wednesday, June 15, 2005 9:47 AM > > > > To: nfs@lists.sourceforge.net > > > > Subject: Re: [NFS] NFS tuning - high performance throughput. > > > > > > > > Roger Heflin wrote: > > > > > > > > >Are you using the same dd test on the local machine test? > > > > If so cache > > > > >will be a major factor. > > > > > > > > > >Also, raid5 stripe size, bigger is almost always better, > > I would do > > > > >some testing with different strip sizes and see how it > > affects the > > > > >speed, I have never seen less than 32k be faster than 32k. > > > > > > > > > >Are you using md for the raid5 setup or something else > > that has not > > > > >been mentioned? > > > > > > > > > > Roger > > > > > > > > > > > > > > Roger, > > > > > > > > I was told by a consultant we hired to help us with this that 5Gb > > > > test files should be large enough to blow out the caches. > > > > Looking back I realize that must have been when we only > > had 2Gb RAM > > > > in the machine .. > > > > is there a better test I can use to check local rw > > performance, as > > > > well is there a formula or recommended way to calculate how big a > > > > file you should rw to properly check local rw? > > > > > > > > I'll have to ask our soft eng who worked with the > > consultant as to > > > > why that particular stripe size was used. I believe we are using > > > > md, we attempted to use LVM but ran into some problems > > and had some > > > > file inconsistencies that were unacceptable so had to back out of > > > > it. > > > > > > > > Cheers > > > > Todd > > > > > > > > -- > > > > Systems Administrator > > > > ---------------------------------- > > > > Soho VFX - Visual Effects Studio > > > > 99 Atlantic Avenue, Suite 303 > > > > Toronto, Ontario, M6K 3J8 > > > > (416) 516-7863 > > > > http://www.sohovfx.com > > > > ---------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > SF.Net email is sponsored by: Discover Easy Linux Migration > > > > Strategies from IBM. Find simple to follow Roadmaps, > > straightforward > > > > articles, informative Webcasts and more! Get everything > > you need to > > > > get up to speed, fast. > > > > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > > > _______________________________________________ > > > > NFS maillist - NFS@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > SF.Net email is sponsored by: Discover Easy Linux Migration > > Strategies > > > from IBM. Find simple to follow Roadmaps, straightforward articles, > > > informative Webcasts and more! Get everything you need to get up to > > > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > > _______________________________________________ > > > NFS maillist - NFS@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > > > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-15 20:11 ` Dan Stromberg @ 2005-06-15 20:31 ` Roger Heflin 2005-06-15 20:33 ` Chris Penney 1 sibling, 0 replies; 22+ messages in thread From: Roger Heflin @ 2005-06-15 20:31 UTC (permalink / raw) To: 'Dan Stromberg'; +Cc: 'M. Todd Smith', nfs If you do full sync things are horribly slow. If you are doing it in C code, a single sync kernel call works nicely before the close, to flush everything. Roger > -----Original Message----- > From: Dan Stromberg [mailto:strombrg@dcs.nac.uci.edu] > Sent: Wednesday, June 15, 2005 3:11 PM > To: Roger Heflin > Cc: 'M. Todd Smith'; nfs@lists.sourceforge.net; > strombrg@dcs.nac.uci.edu > Subject: RE: [NFS] NFS tuning - high performance throughput. > > On Wed, 2005-06-15 at 14:52 -0500, Roger Heflin wrote: > > The mount/remount (unless you time how long it takes to > unmount), will > > potentially still skew the write test, and the writes will still be > > happening for quite a while after the dd completes. > > Or unless you sync twice? Or write synchronously. > > :) > > > Roger > > > > > -----Original Message----- > > > From: Dan Stromberg [mailto:strombrg@dcs.nac.uci.edu] > > > Sent: Wednesday, June 15, 2005 2:14 PM > > > To: Roger Heflin > > > Cc: 'M. Todd Smith'; nfs@lists.sourceforge.net; > > > strombrg@dcs.nac.uci.edu > > > Subject: RE: [NFS] NFS tuning - high performance throughput. > > > > > > > > > A bigger dd test should work - if you use about 3x your physical > > > memory, you'll probably be fine doing it that way. > > > > > > Another way that may work better in -some- cases, is to > umount the > > > filesystem and re-mount it - except for warm restartable > > > filesystems. > > > > > > On Wed, 2005-06-15 at 10:28 -0500, Roger Heflin wrote: > > > > Make the dd test bigger, with 2x the memory size, 50% > will be from > > > > cache and almost instanteous, with 4x the memory size, > 25% will be > > > > from cache, bigger runs will get closer to actual > reality, on the > > > > tests I do I run things 8x, but if you run 2x, 4x, 8x > memory you > > > > should get a decent graph to give you an idea of what the > > > actual value > > > > is. > > > > > > > > Roger > > > > Atipa Technologies. > > > > > > > > > -----Original Message----- > > > > > From: nfs-admin@lists.sourceforge.net > > > > > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. > > > Todd Smith > > > > > Sent: Wednesday, June 15, 2005 9:47 AM > > > > > To: nfs@lists.sourceforge.net > > > > > Subject: Re: [NFS] NFS tuning - high performance throughput. > > > > > > > > > > Roger Heflin wrote: > > > > > > > > > > >Are you using the same dd test on the local machine test? > > > > > If so cache > > > > > >will be a major factor. > > > > > > > > > > > >Also, raid5 stripe size, bigger is almost always better, > > > I would do > > > > > >some testing with different strip sizes and see how it > > > affects the > > > > > >speed, I have never seen less than 32k be faster than 32k. > > > > > > > > > > > >Are you using md for the raid5 setup or something else > > > that has not > > > > > >been mentioned? > > > > > > > > > > > > Roger > > > > > > > > > > > > > > > > > Roger, > > > > > > > > > > I was told by a consultant we hired to help us with this that > > > > > 5Gb test files should be large enough to blow out the caches. > > > > > Looking back I realize that must have been when we only > > > had 2Gb RAM > > > > > in the machine .. > > > > > is there a better test I can use to check local rw > > > performance, as > > > > > well is there a formula or recommended way to > calculate how big > > > > > a file you should rw to properly check local rw? > > > > > > > > > > I'll have to ask our soft eng who worked with the > > > consultant as to > > > > > why that particular stripe size was used. I believe we are > > > > > using md, we attempted to use LVM but ran into some problems > > > and had some > > > > > file inconsistencies that were unacceptable so had to > back out > > > > > of it. > > > > > > > > > > Cheers > > > > > Todd > > > > > > > > > > -- > > > > > Systems Administrator > > > > > ---------------------------------- > > > > > Soho VFX - Visual Effects Studio > > > > > 99 Atlantic Avenue, Suite 303 > > > > > Toronto, Ontario, M6K 3J8 > > > > > (416) 516-7863 > > > > > http://www.sohovfx.com > > > > > ---------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > > SF.Net email is sponsored by: Discover Easy Linux Migration > > > > > Strategies from IBM. Find simple to follow Roadmaps, > > > straightforward > > > > > articles, informative Webcasts and more! Get everything > > > you need to > > > > > get up to speed, fast. > > > > > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > > > > _______________________________________________ > > > > > NFS maillist - NFS@lists.sourceforge.net > > > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > SF.Net email is sponsored by: Discover Easy Linux Migration > > > Strategies > > > > from IBM. Find simple to follow Roadmaps, straightforward > > > > articles, informative Webcasts and more! Get everything > you need > > > > to get up to speed, fast. > > > > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > > > _______________________________________________ > > > > NFS maillist - NFS@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > > > > > > > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-15 20:11 ` Dan Stromberg 2005-06-15 20:31 ` Roger Heflin @ 2005-06-15 20:33 ` Chris Penney 1 sibling, 0 replies; 22+ messages in thread From: Chris Penney @ 2005-06-15 20:33 UTC (permalink / raw) To: nfs On 6/15/05, Dan Stromberg <strombrg@dcs.nac.uci.edu> wrote: > On Wed, 2005-06-15 at 14:52 -0500, Roger Heflin wrote: > > The mount/remount (unless you time how long it takes to unmount), > > will potentially still skew the write test, and the writes > > will still be happening for quite a while after the dd completes. >=20 > Or unless you sync twice? Or write synchronously. >=20 You can optionally use iozone (www.iozone.org) you can use '-c -e' to include fsync() and close() times. It also has an option to umount the volume between iterations. Also, I missed that he was using a 4k stripe size before. That's got to be his issue. My luns are 8+1 h/w raid 5 using a 64k segment size per disk and I use 512k (8*64) as the stripe size. As was intelligently noted, you really want your writes to fill a whole stripe if you can. If they do not my understanding is that you should lower both your segment size and stripe width. I think you always want your stripe width to match the width of your raid 5 lun. Chris ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 22:49 ` M. Todd Smith 2005-06-15 13:03 ` Roger Heflin @ 2005-06-15 17:47 ` Bill Rugolsky Jr. 2005-06-15 20:33 ` M. Todd Smith 2005-06-15 22:47 ` Greg Banks 1 sibling, 2 replies; 22+ messages in thread From: Bill Rugolsky Jr. @ 2005-06-15 17:47 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs M. Todd Smith wrote: > I'm not sure what a MiB/s is. I've been using the following for testing > writes. MiB = 2^20 Bytes MB = 10^6 bytes > time dd if=/dev/zero of=/mnt/array1/testfile5G.001 bs=512k count=10240 Small file and large file tests are by nature quite different, as are cached and uncached reads and writes. For a large file test, I'd use several times the RAM in your machine (say 16-20GB). For small file tests, 100-200MB. To separate out the effects of your SAN performance from knfsd performance, you may want to do the small file test by exporting a (ext2) filesystem from a ramdisk, or a loopback file mount in /dev/shm. [Unfortunately, the tmpfs filesystem doesn't implement the required methods directly, as it would be handy for testing.] For uncached reads/writes, consider using the new upstream coreutils: ftp://alpha.gnu.org/gnu/coreutils/coreutils-5.3.0.tar.bz2 dd has new iflag= and oflag= options with the following flags: append append mode (makes sense for output file only) direct use direct I/O for data dsync use synchronized I/O for data sync likewise, but also for metadata nonblock use non-blocking I/O nofollow do not follow symlinks noctty do not assign controlling terminal from file [N.B.: NFS Direct-I/O requests > 16M may Oops on kernels prior to 2.6.11.] > ttcp-r: 16777216 bytes in 0.141 real seconds = 115970.752 KB/sec +++ UDP result looks OK. How about TCP? What about packet reordering on your bonded 4 port NIC? > exec,dev,suid,rw,rsize=32768,wsize=32768,timeo=500,retrans=10,retry=60,bg UDP? I wouldn't use UDP with such a large rsize/wsize -- that's two dozen fragments on a 1500 MTU network! You also have, due to the bonding, an effectively mixed-speed network *and* packet reordering. Have you looked at your interface statistics? Does everything look fine? These days, I'd use TCP. The Linux NFS TCP client is very mature, and the NFS TCP server is working fine for me. Linux NFS UDP fragment handling / retry logic has long been a source of problems, particularly across mixed-speed networks (e.g., 100/1000). TCP adapts automatically. While TCP requires slightly more processing overhead, this should not be an issue on modern CPUs. Additionally, modern NICs like e1000 support TSO (TCP Segmentation Offload), and though TSO has had its share of bugs, it is the better path forward. IMHO, packet reordering at the TCP layer is something that has received attention in the Linux kernel, and there are ways to measure it and compensate for it (via /proc/sys/net/ipv4/* tunables). I'd much rather try and understand the issue there than at either the IP fragment layer or the kernel RPC layer. > RAID 5, 4k strip size, XFS file system. 4K? That's pretty tiny. OTOH, using too large a stripe with NFS over RAID5 can be no good either, if it results in partial writes that require a read/modify/write cycle, so it is perhaps best not to go very large. If your SAN gives you statistics about distribution of write sizes coming from the NFS server, that would help in choosing a stripe size. Sorry, I know very little about XFS. > > > > o I/O scheduler > > > Not sure what you mean here. The disk "elevator algorithm" - anticipatory, deadline, or cfq. grep . /dev/null /sys/block/*/queue/scheduler IMHO, anticipatory is good for a workstation, but not so good for a file server. But that shouldn't iaffect on your sequential I/O tests. > > o queue depths (/sys/block/*/queue/nr_requests) > 1024 Sane. > > o readahead (/sbin/blockdev --getra <device>) > > > 256 You might want to compare a local sequential read test with /sbin/blockdev --setra {...,4096,8192,16384,...} <device> Traffic on the linux-lvm list suggests increasing the readahead on the logical device, and decreasing it on the underlying physical devices, but your mileage may vary. Setting it too high will pessimize random i/o performance. > vm.vfs_cache_pressure = 100 > vm.nr_pdflush_threads = 2 > vm.dirty_expire_centisecs = 3000 > vm.dirty_writeback_centisecs = 500 > vm.dirty_ratio = 29 > vm.dirty_background_ratio = 7 Experience with Ext3 data journaling indicates that dropping expire/writeback can help to smooth out I/O: vm.dirty_expire_centisecs = {300-1000} vm.dirty_writeback_centisecs = {50-100} Again, I have no experience with XFS. Since it only does meta-data journaling, (equivalent of Ext3 data=writeback), its performance characteristics are probably quite different. Regards, Bill ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-15 17:47 ` Bill Rugolsky Jr. @ 2005-06-15 20:33 ` M. Todd Smith 2005-06-15 22:43 ` Bill Rugolsky Jr. 2005-06-15 22:47 ` Greg Banks 1 sibling, 1 reply; 22+ messages in thread From: M. Todd Smith @ 2005-06-15 20:33 UTC (permalink / raw) To: nfs Bill Rugolsky Jr. wrote: > MiB = 2^20 Bytes > MB = 10^6 bytes > > > Thanks for clearing that up .. nowhere near that speed. >Small file and large file tests are by nature quite different, as are >cached and uncached reads and writes. > >For a large file test, I'd use several times the RAM in your machine >(say 16-20GB). For small file tests, 100-200MB. To separate out the >effects of your SAN performance from knfsd performance, you may want to do >the small file test by exporting a (ext2) filesystem from a ramdisk, or >a loopback file mount in /dev/shm. [Unfortunately, the tmpfs filesystem >doesn't implement the required methods directly, as it would be handy for >testing.] > >For uncached reads/writes, consider using the new upstream coreutils: > >ftp://alpha.gnu.org/gnu/coreutils/coreutils-5.3.0.tar.bz2 > > dd has new iflag= and oflag= options with the following flags: > > append append mode (makes sense for output file only) > direct use direct I/O for data > dsync use synchronized I/O for data > sync likewise, but also for metadata > nonblock use non-blocking I/O > nofollow do not follow symlinks > noctty do not assign controlling terminal from file > >[N.B.: NFS Direct-I/O requests > 16M may Oops on kernels prior to 2.6.11.] > > > I'll try out the new core-utils when I can (*hoping next week I can get this server out of production*). I'm sorry what would writing to a virtual FS tell me in regards to my SAN, perhaps you can explain in more detail? >>ttcp-r: 16777216 bytes in 0.141 real seconds = 115970.752 KB/sec +++ >> >> > >UDP result looks OK. How about TCP? What about packet reordering on >your bonded 4 port NIC? > > > >>exec,dev,suid,rw,rsize=32768,wsize=32768,timeo=500,retrans=10,retry=60,bg >> >> > >UDP? > >I wouldn't use UDP with such a large rsize/wsize -- that's two dozen >fragments on a 1500 MTU network! You also have, due to the bonding, >an effectively mixed-speed network *and* packet reordering. > >Have you looked at your interface statistics? Does everything look >fine? > > I'm very apt to agree with you, I see no reason to continue to use UDP for NFS traffic and have read that the UDP fragment handling in Linux was sub-par. Here are some netstat -s stats from the server: Ip: 446801331 total packets received 0 forwarded 0 incoming packets discarded 314401713 incoming packets delivered 256822806 requests sent out 5800 fragments dropped after timeout 143422528 reassemblies required 11022911 packets reassembled ok 246950 packet reassembles failed 48736566 fragments received ok Icmp: 25726 ICMP messages received 0 input ICMP message failed. ICMP input histogram: timeout in transit: 25709 echo requests: 14 echo replies: 3 5259 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 2189 time exceeded: 3056 echo replies: 14 Tcp: 34 active connections openings 675 passive connection openings 0 failed connection attempts 2 connection resets received 3 connections established 139364522 segments received 82043064 segments send out 35697 segments retransmited 0 bad segments received. 232 resets sent Udp: 175434421 packets received 2189 packets to unknown port received. 0 packet receive errors 549511042 packets sent TcpExt: ArpFilter: 0 294 TCP sockets finished time wait in fast timer 165886 delayed acks sent 310 delayed acks further delayed because of locked socket Quick ack mode was activated 84 times 5347 packets directly queued to recvmsg prequeue. 3556184 packets directly received from backlog 7451568 packets directly received from prequeue 115727204 packets header predicted 7693 packets header predicted and directly queued to user TCPPureAcks: 7228029 TCPHPAcks: 22682518 TCPRenoRecovery: 37 TCPSackRecovery: 7688 TCPSACKReneging: 0 TCPFACKReorder: 12 TCPSACKReorder: 101 TCPRenoReorder: 0 TCPTSReorder: 949 TCPFullUndo: 1209 TCPPartialUndo: 6887 TCPDSACKUndo: 2506 TCPLossUndo: 237 TCPLoss: 23727 TCPLostRetransmit: 6 TCPRenoFailures: 0 TCPSackFailures: 291 TCPLossFailures: 12 TCPFastRetrans: 23567 TCPForwardRetrans: 6191 TCPSlowStartRetrans: 3769 TCPTimeouts: 1505 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 355 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 84 TCPDSACKOfoSent: 0 TCPDSACKRecv: 7454 TCPDSACKOfoRecv: 1 TCPAbortOnSyn: 0 TCPAbortOnData: 0 TCPAbortOnClose: 1 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 0 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0 Regarding the bonding .. Writes to the SAN happen on a single port of the NIC so in writing there are very few reorderings needed. Reading from the SAN breaks the read up on the four ports and so the most reordering would be done client side (even worse most of our clients are still RH 7.2). If I mix TCP and UDP NFS connections will speed be slower than if I used just straight TCP conns? I'll do some testing next week and report my findings. >These days, I'd use TCP. The Linux NFS TCP client is very mature, >and the NFS TCP server is working fine for me. Linux NFS UDP fragment >handling / retry logic has long been a source of problems, particularly >across mixed-speed networks (e.g., 100/1000). TCP adapts automatically. >While TCP requires slightly more processing overhead, this should not be >an issue on modern CPUs. Additionally, modern NICs like e1000 support >TSO (TCP Segmentation Offload), and though TSO has had its share of bugs, >it is the better path forward. > >IMHO, packet reordering at the TCP layer is something that has received >attention in the Linux kernel, and there are ways to measure it and >compensate for it (via /proc/sys/net/ipv4/* tunables). I'd much rather >try and understand the issue there than at either the IP fragment layer >or the kernel RPC layer. > > > This as my first recommendation when I began here .. Is TSO stable enough for production level usage now? Suse still turns it off by default. I'm still looking into the other things you mentioned .. thanks again for your help. Cheers Todd -- Systems Administrator ---------------------------------- Soho VFX - Visual Effects Studio 99 Atlantic Avenue, Suite 303 Toronto, Ontario, M6K 3J8 (416) 516-7863 http://www.sohovfx.com ---------------------------------- ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-15 20:33 ` M. Todd Smith @ 2005-06-15 22:43 ` Bill Rugolsky Jr. 0 siblings, 0 replies; 22+ messages in thread From: Bill Rugolsky Jr. @ 2005-06-15 22:43 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs On Wed, Jun 15, 2005 at 04:33:05PM -0400, M. Todd Smith wrote: > I'll try out the new core-utils when I can (*hoping next week I can get > this server out of production*). As someone else noted, iozone is a good benchmark, and it also has a command-line switch for O_DIRECT. > I'm sorry what would writing to a virtual FS tell me in regards to > my SAN, perhaps you can explain in more detail? It wouldn't tell you anything about your RAID storage, but it would give you a good idea of the upper limit on knfsd write performance for small files, since you'd be storing to RAM. Read tests are as readily done just by ensuring that the file is in the page cache; write tests with the real fs are a bit harder. Another "good-enough" option for estimating knfsd performance limits may be to set the NFS export option to the unsafe "async" mode, use a small ext2 filesystem, and look at small file write throughput. [After all, /tmp on ext2 was for many years about as fast as Solaris tmpfs. :-)] Having done that, you'll know whether the storage configuration is the proximate cause of the lousy performance. > Ip: > 446801331 total packets received > 0 forwarded > 0 incoming packets discarded > 314401713 incoming packets delivered > 256822806 requests sent out > 5800 fragments dropped after timeout > 143422528 reassemblies required > 11022911 packets reassembled ok > 246950 packet reassembles failed > 48736566 fragments received ok That's a fair number of reassembly failures. In light of your next point, you may want to look at the stats on the client side ... > Regarding the bonding .. Writes to the SAN happen on a single port of > the NIC so in writing there are very few reorderings needed. Reading > from the SAN breaks the read up on the four ports and so the most > reordering would be done client side (even worse most of our clients are > still RH 7.2). OK, I just went and re-read bonding.txt. Thanks. :-) > If I mix TCP and UDP NFS connections will speed be > slower than if I used just straight TCP conns? I'll do some testing > next week and report my findings. Well, as I said, I'm disinclined to run over UDP with such a large r/wsize of 32K; we did that once-upon-a-time with Linux and Solaris hosts talking to a NetApp, and the results were unpleasant; we had to use r/wsize=8K until we were able to switch to TCP. Things might look fine with a single client running a benchmark, and look very different on a congested network. Also, I'm not sure about fairness with respect to a mix of TCP and UDP clients; if the network is loaded and UDP starts to timeout, will the UDP clients recover if the TCP clients are active? > This as my first recommendation when I began here .. Is TSO stable > enough for production level usage now? Suse still turns it off by default. As a general proposition, I'd say no to blindly using it without a good bit of testing. David S. Miller has been posting his rewritten TSO framework to netdev in the last few weeks; perhaps by 2.6.13 ... -Bill ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-15 17:47 ` Bill Rugolsky Jr. 2005-06-15 20:33 ` M. Todd Smith @ 2005-06-15 22:47 ` Greg Banks 1 sibling, 0 replies; 22+ messages in thread From: Greg Banks @ 2005-06-15 22:47 UTC (permalink / raw) To: Bill Rugolsky Jr.; +Cc: M. Todd Smith, nfs On Wed, Jun 15, 2005 at 01:47:01PM -0400, Bill Rugolsky Jr. wrote: > These days, I'd use TCP. Agreed. > Additionally, modern NICs like e1000 support > TSO (TCP Segmentation Offload), and though TSO has had its share of bugs, > it is the better path forward. Please don't tell him about TSO, it doesn't quite work yet ;-) > > RAID 5, 4k strip size, XFS file system. > > 4K? That's pretty tiny. It's extremely small. We don't use anything less than 64KiB. Use a larger stripe size, and tell XFS what stripe size you're using so it can align IOs correctly: RTFM about the options -d sunit, -d swidth, -l sunit, and -l version to mkfs.xfs. Also, make sure you align the start of the XFS filesystem to a RAID stripe width; this may require futzing with your volume manager config. > OTOH, using too large a stripe with NFS over RAID5 > can be no good either, if it results in partial writes that require a > read/modify/write cycle, so it is perhaps best not to go very large. It depends on your workload; for pure streaming workloads larger stripe is generally better up to a point determined by your filesystem, amount of cache in your RAID controller, and other limitations. We have customer sites with 2MiB stripe sizes for local XFS fileystems, (*not* for NFS service) and it works just fine. But beware, on an NFS server it's easier to get into the partial write case than with local IO. > You might want to compare a local sequential read test with > > /sbin/blockdev --setra {...,4096,8192,16384,...} <device> > > Traffic on the linux-lvm list suggests increasing the readahead on the > logical device, and decreasing it on the underlying physical devices, > but your mileage may vary. Agreed, I would try tuning upwards the logical block device's readahead. > Experience with Ext3 data journaling indicates that dropping expire/writeback > can help to smooth out I/O: > > vm.dirty_expire_centisecs = {300-1000} > vm.dirty_writeback_centisecs = {50-100} > The performance limitation which is helped by tuning the VM to push dirty pages earlier is in NFS not the underlying filesystem, so this technique is useful with XFS too. > Again, I have no experience with XFS. Since it only does meta-data journaling, > (equivalent of Ext3 data=writeback), its performance characteristics are probably > quite different. XFS also does a bunch of clever things (which I don't really understand) to group IO going to disk and to limit metadata traffic for allocation. Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI. ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 20:17 ` M. Todd Smith 2005-06-14 20:41 ` Bill Rugolsky Jr. @ 2005-06-14 20:50 ` Bill Rugolsky Jr. 2005-06-14 21:04 ` Chris Penney 2005-06-14 21:11 ` Roger Heflin 3 siblings, 0 replies; 22+ messages in thread From: Bill Rugolsky Jr. @ 2005-06-14 20:50 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs [Apologies for the resend; mutt chose the wrong From address the first time.] On Tue, Jun 14, 2005 at 04:17:48PM -0400, M. Todd Smith wrote: > We've recently upgraded our NFS server machine to a dual 3.2ghz Xeon > running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) w/ 4Gb RAM. Coupled > with this machine we have a 2 Broadcom NetExtreme 2 Port PCI-X NIC on > its own PCI-X bus (133Mhz). Attached to this machine is our fibre > channel SAN, using Seagate fibre-channel drives and an LSI dual channel > 2gigabit fibre adapter on its own PCI-X bus. Local RW is ~135Mbytes/sec. ... > I have read most of the tuning guides I can find on the net and > attempted just about everything I can get my hands on (I have not tried > jumbo frames yet, still waiting for some downtime to attempt that). My > problem is that no matter how I tune the machines I can get at max > 45Mb/ps throughput on NFS. This was the same throughput we were getting > with our old server with PCI cards, moreover this throughput is roughly > the same for every machine on our network. Theoretically we should be > able to get much higher values. I assume that you mean 45 MiB/s? Reading or writing? What are you using for testing? What are the file sizes? Have you validated network throughput using ttcp or netperf? You say that you've read the tuning guides, but you haven't told us what you have touched. Please tell us: o client-side NFS mount options o RAID configuration (level, stripe size, etc.) o I/O scheduler o queue depths (/sys/block/*/queue/nr_requests) o readahead (/sbin/blockdev --getra <device>) o mount options (e.g., are you using noatime) o filesystem type o journaling mode, if Ext3 or Reiserfs o journal size o internal or external journal o vm tunables: vm.dirty_writeback_centisecs vm.dirty_expire_centisecs vm.dirty_ratio vm.dirty_background_ratio vm.nr_pdflush_threads vm.vfs_cache_pressure Regards, Bill Rugolsky ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 20:17 ` M. Todd Smith 2005-06-14 20:41 ` Bill Rugolsky Jr. 2005-06-14 20:50 ` Bill Rugolsky Jr. @ 2005-06-14 21:04 ` Chris Penney 2005-06-14 21:06 ` Chris Penney 2005-06-14 21:11 ` Roger Heflin 3 siblings, 1 reply; 22+ messages in thread From: Chris Penney @ 2005-06-14 21:04 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs > Local RW is ~135Mbytes/sec That seems slow for LSI disks. I use LSI disk (STK rebrand) and have one HBA connected to controller A and the other HBA to controller B.=20 I then have two 1TB luns primary on A and B (so a total of 4 TB). I get ~300MB/s. I'm wondering if you are striping across controllers and what file system you use (I use JFS). How are you chaining the luns into one volume? I use the device mapper (4 multipath devices =3D> 1 linear). > I have read most of the tuning guides I can find on the net and > attempted just about everything I can get my hands on (I have not tried > jumbo frames yet, still waiting for some downtime to attempt that). My > problem is that no matter how I tune the machines I can get at max > 45Mb/ps throughput on NFS.=20 That is pretty low, esp. if you are talking about reads, which I can get >100MB/s with only a single e1000 card. I use the following mount options: Client: nosuid,rw,bg,hard,intr,vers=3D3,proto=3Dtcp,rsize=3D32768,wsize=3D3= 2768 Server: rw,sync,no_subtree_check,no_root_squash I use 128 NFS threads (which in SuSE is set in /etc/sysconfig/nfs). In /etc/sysctl.conf I have (can't say how well tuned these are): net.core.rmem_default =3D 262144 net.core.wmem_default =3D 262144 net.core.rmem_max =3D 8388608 net.core.wmem_max =3D 8388608 net.ipv4.tcp_rmem =3D 4096 87380 8388608 net.ipv4.tcp_wmem =3D 4096 65536 8388608 net.ipv4.tcp_mem =3D 8388608 8388608 8388608 I've also found that enabling hyperthreading is a good thing for NFS.=20 Under load using etheral I show an improvement in write/read/commit latency using HT. I'm also using SLES 9. Chris ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NFS tuning - high performance throughput. 2005-06-14 21:04 ` Chris Penney @ 2005-06-14 21:06 ` Chris Penney 0 siblings, 0 replies; 22+ messages in thread From: Chris Penney @ 2005-06-14 21:06 UTC (permalink / raw) To: M. Todd Smith; +Cc: nfs > How are you chaining the luns into one volume? I use the device > mapper (4 multipath devices =3D> 1 linear). Sorry, that should be 4 multipath =3D> 1 striped. Chris ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: NFS tuning - high performance throughput. 2005-06-14 20:17 ` M. Todd Smith ` (2 preceding siblings ...) 2005-06-14 21:04 ` Chris Penney @ 2005-06-14 21:11 ` Roger Heflin 3 siblings, 0 replies; 22+ messages in thread From: Roger Heflin @ 2005-06-14 21:11 UTC (permalink / raw) To: 'M. Todd Smith', nfs Using Suse Enterprise 9.0 single CPU Opteron server, with single broadcom network interface, to a single test machine I can get 95MiB writes, 115Mib/second reads. The test machine is dual cpu, the server is single cpu, and has 2GB of ram. If you are getting 45MB/second you don't need trunking. The local IO speed of our setup was around your 135MB/second on the writes, more like 185MB/second on the reads. 32kb block size was the best, no other significant changes where made to nfs or to the kernel to get this number, the network is clean. I have seen a SAN slow down from what I believe is overloading. How big of test did you run to get the 135MB/second local speed? My test above were done with many many TB of IO such that the cache was a non-issue. NFS normally runs sync so cache is a non-issue, and sustained local performance is the real issue. What is your actual local SAN disk setup like? Roger Atipa Technologies > -----Original Message----- > From: nfs-admin@lists.sourceforge.net > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of M. Todd Smith > Sent: Tuesday, June 14, 2005 3:18 PM > To: nfs@lists.sourceforge.net > Subject: [NFS] NFS tuning - high performance throughput. > > Hello all, > > I've been attempting to get a better hold over our NFS > performance as our network grows and grows. > > We've recently upgraded our NFS server machine to a dual > 3.2ghz Xeon running Fedora Core 3 (kernel 2.6.11-1.14_FC3smp) > w/ 4Gb RAM. Coupled with this machine we have a 2 Broadcom > NetExtreme 2 Port PCI-X NIC on its own PCI-X bus (133Mhz). > Attached to this machine is our fibre channel SAN, using > Seagate fibre-channel drives and an LSI dual channel 2gigabit > fibre adapter on its own PCI-X bus. Local RW is ~135Mbytes/sec. > > The 4 ports are trunked together using bonding (balanced > round robin mode), and trunked together on our Extreme > Networks Summit 400i switch. > All the test machines are attached to this switch, making > everything within one hop and ping times of less than a 200 > nanoseconds. > > Current test machine is running Suse 9.2 and has an Intel > 100/1000 XT server adapter (e1000 driver) on a shared but not > high traffic PCI bus. > Other test machines include some Dell PowerEdge 1850's with > onboard Intel NICs, and some Apple G4, G5 and Xserves. > > I have read most of the tuning guides I can find on the net > and attempted just about everything I can get my hands on (I > have not tried jumbo frames yet, still waiting for some > downtime to attempt that). My problem is that no matter how > I tune the machines I can get at max 45Mb/ps throughput on > NFS. This was the same throughput we were getting with our > old server with PCI cards, moreover this throughput is > roughly the same for every machine on our network. > Theoretically we should be able to get much higher values. > > Any idea as to why this is? I can provide config files and > such if needed, but I'm really at a loss as to where to start. > > Cheers > Todd > > -- > Systems Administrator > ---------------------------------- > Soho VFX - Visual Effects Studio > 99 Atlantic Avenue, Suite 303 > Toronto, Ontario, M6K 3J8 > (416) 516-7863 > http://www.sohovfx.com > ---------------------------------- > > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration > Strategies from IBM. Find simple to follow Roadmaps, > straightforward articles, informative Webcasts and more! Get > everything you need to get up to speed, fast. > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-06-15 22:48 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <482A3FA0050D21419C269D13989C611308539C89@lavender-fe.eng.netapp.com>
2005-06-14 20:38 ` NFS tuning - high performance throughput M. Todd Smith
2005-06-15 1:56 ` Dan Stromberg
2005-06-14 20:40 Lever, Charles
[not found] <20050610031144.4B9CA12F8C@sc8-sf-spam2.sourceforge.net>
2005-06-14 20:17 ` M. Todd Smith
2005-06-14 20:41 ` Bill Rugolsky Jr.
2005-06-14 22:49 ` M. Todd Smith
2005-06-15 13:03 ` Roger Heflin
2005-06-15 14:47 ` M. Todd Smith
2005-06-15 15:28 ` Roger Heflin
2005-06-15 19:13 ` Dan Stromberg
2005-06-15 19:52 ` Roger Heflin
2005-06-15 20:11 ` Dan Stromberg
2005-06-15 20:31 ` Roger Heflin
2005-06-15 20:33 ` Chris Penney
2005-06-15 17:47 ` Bill Rugolsky Jr.
2005-06-15 20:33 ` M. Todd Smith
2005-06-15 22:43 ` Bill Rugolsky Jr.
2005-06-15 22:47 ` Greg Banks
2005-06-14 20:50 ` Bill Rugolsky Jr.
2005-06-14 21:04 ` Chris Penney
2005-06-14 21:06 ` Chris Penney
2005-06-14 21:11 ` Roger Heflin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.