Hi Wendy- Wendy Cheng wrote: > Peter Staubach wrote: >> Wendy Cheng wrote: >>> >>>> top - 15:50:56 up 20 days, 1:33, 9 users, load average: 3.42, >>>> 2.95, 2.38 >>>> >>>> 19200 geo0501 15 0 75076 5224 3480 S 2 0.1 0:07.94 >>>> smbd 2336 root 10 -5 0 0 0 S 1 >>>> 0.0 57:07.70 kjournald 2334 root 10 -5 0 >>>> 0 0 S 1 0.0 33:19.89 kjournald 2279 root 10 >>>> -5 0 0 0 S 0 0.0 15:10.98 md0_raid1 2283 >>>> root 10 -5 0 0 0 S 0 0.0 24:45.79 >>>> md1_raid1 3935 root 15 0 0 0 0 S 0 >>>> 0.0 14:04.25 nfsd 3943 root 15 0 0 >>>> 0 0 S 0 0.0 14:18.43 nfsd 3947 root 15 >>>> 0 0 0 0 S 0 0.0 13:57.06 nfsd 8325 >>>> ed0127 15 0 75044 4812 3264 S 0 0.1 0:01.29 >>>> smbd >>> Intuitively (based on ext3's journal threads info above) I would >>> suspect this is due to the change of the export default option from >>> "async" to "sync" between 2.6.9 and 2.6.18 kernels. So go to your >>> /etc/exports file and explicitly set the export option to "async" to >>> see whether you can get the performance back. >>> >>> e.g. changes "/server *(rw)" to "/server *(async, rw)". >> While this may or may not restore your performance aspects, it >> is not safe to make this change. The change was made for a >> reason. > > Not to start a flame war :) but please read his email. His *old* system, > that uses "async" option", has been running fine for several years. Why > all of sudden, an "async" option is such a big issue ? That means his old system would have been exposed to data corruption issues if it crashes (panic, power outage, etc). Using "sync" became default because async is inherently careless about data integrity. The data loss is often entirely silent. This is explained in the Linux NFS FAQ, question B6. See http://nfs.sourceforge.net/index.php#faq_b6 It's another case of where we perform better in older kernels but we are more correct in recent kernels... but our users don't appreciate the correctness improvement :-) > -- Wendy >> Please any and all other possibilities before making this change. >> It is not free. It's a reasonable *experiment* to try adding the "async" export option. That would identify the source of the performance loss. It's almost never a good choice to use "async" in production.