All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
       [not found] <mailman.194659.1195581701.26582.nfs@lists.sourceforge.net>
@ 2007-11-20 20:22 ` Wendy Cheng
       [not found]   ` <BAY104-W43766F661E50E7FFCDA8EBA7F0-MsuGFMq8XAE@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Wendy Cheng @ 2007-11-20 20:22 UTC (permalink / raw)
  To: rps-G7wE2RHCdwNeoWH0uzbU5w; +Cc: nfs, wcheng



> 
> top - 15:50:56 up 20 days,  1:33,  9 users,  load average: 3.42, 2.95, 2.38
> 
> 19200 geo0501   15   0 75076 5224 3480 S    2  0.1   0:07.94 smbd               
>  2336 root      10  -5     0    0    0 S    1  0.0  57:07.70 kjournald          
>  2334 root      10  -5     0    0    0 S    1  0.0  33:19.89 kjournald          
>  2279 root      10  -5     0    0    0 S    0  0.0  15:10.98 md0_raid1          
>  2283 root      10  -5     0    0    0 S    0  0.0  24:45.79 md1_raid1          
>  3935 root      15   0     0    0    0 S    0  0.0  14:04.25 nfsd               
>  3943 root      15   0     0    0    0 S    0  0.0  14:18.43 nfsd               
>  3947 root      15   0     0    0    0 S    0  0.0  13:57.06 nfsd               
>  8325 ed0127    15   0 75044 4812 3264 S    0  0.1   0:01.29 smbd            

Intuitively (based on ext3's journal threads info above) I would suspect this is due to the change of the export default option from "async" to "sync" between 2.6.9 and 2.6.18 kernels.  So go to your /etc/exports file and explicitly set the export option to "async" to see whether you can get the performance back.

e.g. changes "/server *(rw)" to "/server *(async, rw)".

-- Wendy


> 
> This server  was installed with Suse 9.1 (linux 2.6.4-52, nfs-utils-1.0.6-103)
> and worked well for a couple of years.
> Then I had to upgrade it (since 9.1 was no longer being updated) and I
> installed 10.1 (linux 2.6.18.8-0.7, nfs-utils-1.0.10-22).
> 
> This upgrade caused some serious performance problem (I suspect related
> to locks).  The most obvious problem is that it can take a long time
> for a KDE user to login (5 minutes or more !), when a number of users
> login at the same time (e.g. a classroom with 16 students). KDE seems
> to spend that time writing and rewriting files in ~/.kde/share/ .
> 
> I know that the problem is related to the change of version, because
> I could boot the old system (it is still installed in another partition)
> and than it worked well again. Samba doesn't seem to be affected, since
> the windows users don't complain.
> 
> I tried to ask on the suse mailing list and newsgroup if someone else
> had the same problem, or if someone had a similar setup working well,
> but I didn't get any useful answer. 
> 
> 
> I don't know what to do about this. Things I thought of trying:
> 
>  - install 10.2 (from scratch)
> 
> I did this, and not only it did not solve the problem, but also, now 
> I can't boot the 9.1 version, because fsck complains that the file
> systems are using features that it doesn't understand.
> 
>  - downgrade to 10.0
> 
>  - downgrade to 9.3 or 9.1 (but since these are no longer being updated,
>    I have to put the server in a private network, close ssh, etc. and
>    even then it is somewhat dangerous.)
> 
> 
> I also installed another PC with all these versions and I tried to
> check which versions have problems. But the tests I did with this
> computer were not conclusive. Maybe because it is not the same
> hardware (32-bit version) or because I can't duplicate the same load.
> 
>  
> 
> Thanks in advance for any suggestions
> 
> -- 
> rps
> 
> 
> 
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
       [not found]   ` <BAY104-W43766F661E50E7FFCDA8EBA7F0-MsuGFMq8XAE@public.gmane.org>
@ 2007-11-20 21:17     ` Peter Staubach
  2007-11-20 21:23       ` Wendy Cheng
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Staubach @ 2007-11-20 21:17 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: nfs, wcheng

Wendy Cheng wrote:
>   
>> top - 15:50:56 up 20 days,  1:33,  9 users,  load average: 3.42, 2.95, 2.38
>>
>> 19200 geo0501   15   0 75076 5224 3480 S    2  0.1   0:07.94 smbd               
>>  2336 root      10  -5     0    0    0 S    1  0.0  57:07.70 kjournald          
>>  2334 root      10  -5     0    0    0 S    1  0.0  33:19.89 kjournald          
>>  2279 root      10  -5     0    0    0 S    0  0.0  15:10.98 md0_raid1          
>>  2283 root      10  -5     0    0    0 S    0  0.0  24:45.79 md1_raid1          
>>  3935 root      15   0     0    0    0 S    0  0.0  14:04.25 nfsd               
>>  3943 root      15   0     0    0    0 S    0  0.0  14:18.43 nfsd               
>>  3947 root      15   0     0    0    0 S    0  0.0  13:57.06 nfsd               
>>  8325 ed0127    15   0 75044 4812 3264 S    0  0.1   0:01.29 smbd            
>>     
>
> Intuitively (based on ext3's journal threads info above) I would suspect this is due to the change of the export default option from "async" to "sync" between 2.6.9 and 2.6.18 kernels.  So go to your /etc/exports file and explicitly set the export option to "async" to see whether you can get the performance back.
>
> e.g. changes "/server *(rw)" to "/server *(async, rw)".

While this may or may not restore your performance aspects, it
is not safe to make this change.  The change was made for a
reason.

Please any and all other possibilities before making this change.
It is not free.

    Thanx...

       ps

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-20 21:17     ` Peter Staubach
@ 2007-11-20 21:23       ` Wendy Cheng
  2007-11-21 16:04         ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Wendy Cheng @ 2007-11-20 21:23 UTC (permalink / raw)
  To: Peter Staubach; +Cc: Wendy Cheng, nfs

Peter Staubach wrote:
> Wendy Cheng wrote:
>>  
>>> top - 15:50:56 up 20 days,  1:33,  9 users,  load average: 3.42, 
>>> 2.95, 2.38
>>>
>>> 19200 geo0501   15   0 75076 5224 3480 S    2  0.1   0:07.94 
>>> smbd                2336 root      10  -5     0    0    0 S    1  
>>> 0.0  57:07.70 kjournald           2334 root      10  -5     0    
>>> 0    0 S    1  0.0  33:19.89 kjournald           2279 root      10  
>>> -5     0    0    0 S    0  0.0  15:10.98 md0_raid1           2283 
>>> root      10  -5     0    0    0 S    0  0.0  24:45.79 
>>> md1_raid1           3935 root      15   0     0    0    0 S    0  
>>> 0.0  14:04.25 nfsd                3943 root      15   0     0    
>>> 0    0 S    0  0.0  14:18.43 nfsd                3947 root      15   
>>> 0     0    0    0 S    0  0.0  13:57.06 nfsd                8325 
>>> ed0127    15   0 75044 4812 3264 S    0  0.1   0:01.29 
>>> smbd                
>>
>> Intuitively (based on ext3's journal threads info above) I would 
>> suspect this is due to the change of the export default option from 
>> "async" to "sync" between 2.6.9 and 2.6.18 kernels.  So go to your 
>> /etc/exports file and explicitly set the export option to "async" to 
>> see whether you can get the performance back.
>>
>> e.g. changes "/server *(rw)" to "/server *(async, rw)".
>
> While this may or may not restore your performance aspects, it
> is not safe to make this change.  The change was made for a
> reason.
Not to start a flame war :) but please read his email. His *old* system, 
that uses "async" option", has been running fine for several years. Why 
all of sudden, an "async" option is such a big issue ?

-- Wendy
>
> Please any and all other possibilities before making this change.
> It is not free.
>
>    Thanx...
>
>       ps


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-20 21:23       ` Wendy Cheng
@ 2007-11-21 16:04         ` Chuck Lever
  2007-11-26  3:28           ` Wendy Cheng
  2007-11-27 18:40           ` Rui Pedro Mendes Salgueiro
  0 siblings, 2 replies; 17+ messages in thread
From: Chuck Lever @ 2007-11-21 16:04 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: Peter Staubach, Wendy Cheng, nfs

[-- Attachment #1: Type: text/plain, Size: 2544 bytes --]

Hi Wendy-

Wendy Cheng wrote:
> Peter Staubach wrote:
>> Wendy Cheng wrote:
>>>  
>>>> top - 15:50:56 up 20 days,  1:33,  9 users,  load average: 3.42, 
>>>> 2.95, 2.38
>>>>
>>>> 19200 geo0501   15   0 75076 5224 3480 S    2  0.1   0:07.94 
>>>> smbd                2336 root      10  -5     0    0    0 S    1  
>>>> 0.0  57:07.70 kjournald           2334 root      10  -5     0    
>>>> 0    0 S    1  0.0  33:19.89 kjournald           2279 root      10  
>>>> -5     0    0    0 S    0  0.0  15:10.98 md0_raid1           2283 
>>>> root      10  -5     0    0    0 S    0  0.0  24:45.79 
>>>> md1_raid1           3935 root      15   0     0    0    0 S    0  
>>>> 0.0  14:04.25 nfsd                3943 root      15   0     0    
>>>> 0    0 S    0  0.0  14:18.43 nfsd                3947 root      15   
>>>> 0     0    0    0 S    0  0.0  13:57.06 nfsd                8325 
>>>> ed0127    15   0 75044 4812 3264 S    0  0.1   0:01.29 
>>>> smbd                
>>> Intuitively (based on ext3's journal threads info above) I would 
>>> suspect this is due to the change of the export default option from 
>>> "async" to "sync" between 2.6.9 and 2.6.18 kernels.  So go to your 
>>> /etc/exports file and explicitly set the export option to "async" to 
>>> see whether you can get the performance back.
>>>
>>> e.g. changes "/server *(rw)" to "/server *(async, rw)".
>> While this may or may not restore your performance aspects, it
>> is not safe to make this change.  The change was made for a
>> reason.
 >
> Not to start a flame war :) but please read his email. His *old* system, 
> that uses "async" option", has been running fine for several years. Why 
> all of sudden, an "async" option is such a big issue ?

That means his old system would have been exposed to data corruption 
issues if it crashes (panic, power outage, etc).  Using "sync" became 
default because async is inherently careless about data integrity.  The 
data loss is often entirely silent.

This is explained in the Linux NFS FAQ, question B6.

See http://nfs.sourceforge.net/index.php#faq_b6

It's another case of where we perform better in older kernels but we are 
more correct in recent kernels... but our users don't appreciate the 
correctness improvement :-)

> -- Wendy
>> Please any and all other possibilities before making this change.
>> It is not free.

It's a reasonable *experiment* to try adding the "async" export option. 
  That would identify the source of the performance loss.  It's almost 
never a good choice to use "async" in production.

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 315 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


[-- Attachment #3: Type: text/plain, Size: 228 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

[-- Attachment #4: Type: text/plain, Size: 362 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-21 16:04         ` Chuck Lever
@ 2007-11-26  3:28           ` Wendy Cheng
  2007-11-26  4:41             ` Trond Myklebust
  2007-11-26  5:02             ` J. Bruce Fields
  2007-11-27 18:40           ` Rui Pedro Mendes Salgueiro
  1 sibling, 2 replies; 17+ messages in thread
From: Wendy Cheng @ 2007-11-26  3:28 UTC (permalink / raw)
  To: chuck.lever; +Cc: nfs

Chuck Lever wrote:

> Hi Wendy-
>
> That means his old system would have been exposed to data corruption 
> issues if it crashes (panic, power outage, etc).  Using "sync" became 
> default because async is inherently careless about data integrity.  
> The data loss is often entirely silent.
>
> This is explained in the Linux NFS FAQ, question B6.
>
> See http://nfs.sourceforge.net/index.php#faq_b6


Setting aside NFS for a moment...  for a locally mounted filesystem, the 
file data stays in the cache until write-back occurs. Upon crashing, 
there are always possibilities that the data could be lost. Journaling 
filesystems such as EXT3 can only ensure no meta-data corruption, there 
is no guarantee that data would be saved unless the filesystem is 
mounted with "sync" option. With non-trivial performance hits, most of 
the filesystems are hardly mounted with "sync" option. Applications 
normally understand the problem and whenever required, fsync() and/or 
similar mechanisms are applied.

For Linux NFS servers to deviate from this common practice, by reading 
the FAQ, I assume something has been done (particularly from client 
ends) to alleviate the performance hit ? Could you elaborate more about 
this ?

Again, I'm not trying to argue and/or start a flamewar. I have a need to 
understand more about this issue. The "sync" operation is very expensive 
for us (cluster filesystem) and I'm under the gun to improve our NFS 
file serving performance at this moment.

-- Wendy



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-26  3:28           ` Wendy Cheng
@ 2007-11-26  4:41             ` Trond Myklebust
  2007-11-26  5:02             ` J. Bruce Fields
  1 sibling, 0 replies; 17+ messages in thread
From: Trond Myklebust @ 2007-11-26  4:41 UTC (permalink / raw)
  To: wcheng; +Cc: chuck.lever, nfs


On Sun, 2007-11-25 at 22:28 -0500, Wendy Cheng wrote:
> Setting aside NFS for a moment...  for a locally mounted filesystem, the 
> file data stays in the cache until write-back occurs. Upon crashing, 
> there are always possibilities that the data could be lost. Journaling 
> filesystems such as EXT3 can only ensure no meta-data corruption, there 
> is no guarantee that data would be saved unless the filesystem is 
> mounted with "sync" option. With non-trivial performance hits, most of 
> the filesystems are hardly mounted with "sync" option. Applications 
> normally understand the problem and whenever required, fsync() and/or 
> similar mechanisms are applied.
> 
> For Linux NFS servers to deviate from this common practice, by reading 
> the FAQ, I assume something has been done (particularly from client 
> ends) to alleviate the performance hit ? Could you elaborate more about 
> this ?
> 
> Again, I'm not trying to argue and/or start a flamewar. I have a need to 
> understand more about this issue. The "sync" operation is very expensive 
> for us (cluster filesystem) and I'm under the gun to improve our NFS 
> file serving performance at this moment.

You've got it wrong. The 'async' option was the Linux-specific option
that violates the NFS spec, not 'sync'.

Please read the RFCs: NFS has always imposed strict requirements on the
server w.r.t. data integrity. 'async' violates those requirements
because it allows the server to cache data in circumstances where the
client is under the belief that the data is on permanent storage.

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-26  3:28           ` Wendy Cheng
  2007-11-26  4:41             ` Trond Myklebust
@ 2007-11-26  5:02             ` J. Bruce Fields
       [not found]               ` <18254.19187.470275.538680@notabene.brown>
  1 sibling, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2007-11-26  5:02 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: chuck.lever, nfs

On Sun, Nov 25, 2007 at 10:28:43PM -0500, Wendy Cheng wrote:
> Chuck Lever wrote:
> 
> > Hi Wendy-
> >
> > That means his old system would have been exposed to data corruption 
> > issues if it crashes (panic, power outage, etc).  Using "sync" became 
> > default because async is inherently careless about data integrity.  
> > The data loss is often entirely silent.
> >
> > This is explained in the Linux NFS FAQ, question B6.
> >
> > See http://nfs.sourceforge.net/index.php#faq_b6
> 
> 
> Setting aside NFS for a moment...  for a locally mounted filesystem, the 
> file data stays in the cache until write-back occurs. Upon crashing, 
> there are always possibilities that the data could be lost. Journaling 
> filesystems such as EXT3 can only ensure no meta-data corruption, there 
> is no guarantee that data would be saved unless the filesystem is 
> mounted with "sync" option. With non-trivial performance hits, most of 
> the filesystems are hardly mounted with "sync" option. Applications 
> normally understand the problem and whenever required, fsync() and/or 
> similar mechanisms are applied.

As far as I know, even an explicit fsync() is ineffective in the case of
NFSv2 when the async export option is set.  (With v3 and v4 I think it
still works, since (from a quick check of the code) it does respect the
stable flag even on async exports.)

An application on a local disk goes down when the system goes down,
whereas an NFS server can reboot without the applications using it
exiting.

So while a well-designed application might be built to deal with the
situation where a mkdir() that a previous instance performed is no
longer there when it starts up again, it may not be ready to deal with a
directory it just created simply diseappearing out from under it while
it's running.

(Stupid question: what would it take to give NFS the equivalent to
COMMIT for directory operations?)

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-21 16:04         ` Chuck Lever
  2007-11-26  3:28           ` Wendy Cheng
@ 2007-11-27 18:40           ` Rui Pedro Mendes Salgueiro
       [not found]             ` <20071127184050.GA13791-/uZbGGhPd8Gmv2Aub4FSgw@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: Rui Pedro Mendes Salgueiro @ 2007-11-27 18:40 UTC (permalink / raw)
  To: nfs

Thanks for the replies, everyone. Last week I didn't had the time to
send this mail. Some comments below:

On Wed, Nov 21, 2007 at 11:04:55AM -0500, Chuck Lever wrote:
> Wendy Cheng wrote:
> >Peter Staubach wrote:
> >>Wendy Cheng wrote:

> >>>Intuitively (based on ext3's journal threads info above) I would 
> >>>suspect this is due to the change of the export default option from 
> >>>"async" to "sync" between 2.6.9 and 2.6.18 kernels.

The problem with that idea is that my /etc/exports file had (always ?) the
"sync" option on. (I checked a backup file from 2005-03-14). I think the
Suse management software already used the "sync" option in /etc/exports.
(probably when I first transitioned the server to linux I used whatever
YAST chose as default).

The default export might have been "async", but unless the option "sync"
in /etc/exports was being ignored I was already using "sync". Nevertheless
I will try to change to async and test if it makes a difference.

(one day later: )

I have now tried it and the load on the NFS server is much lower and KDE
logins seem to be reasonably fast now.

This doesn't mean that the drop in performance was due to "sync" versus
"async".  That is, the old version could be really using "sync" and for
some other change (not a change from "async" to "sync") the performance
dropped a lot between those versions.

One thing I suspected was quotas since the old version didn't seem to
handle them. But that was the first thing I tried, turn off quotas
and see if it made a difference. It didn't.

BTW, part of the problem is due to KDE doing a lot of file activity.
I already knew that fvwm (what I personally use) did not take a long
time to login and I have now tried gnome which also started fast enough
(with "sync"). But KDE used to work...

> >>> So go to your 
> >>>/etc/exports file and explicitly set the export option to "async" to 
> >>>see whether you can get the performance back.

> >>While this may or may not restore your performance aspects, it
> >>is not safe to make this change.  The change was made for a
> >>reason.

> That means his old system would have been exposed to data corruption 
> issues if it crashes (panic,

Luckly it has been reliable. Some years ago (a previous server) crashed
a lot but that was due to an obscure bug (XFS + SMP kernel + NFS = crash,
IIRC) which I don't know if it was ever fixed:

http://groups.google.com/group/alt.os.linux.suse/browse_frm/thread/f24dd8f878bb3ea3/7e6ffa45f3873716?hl=en&lnk=st#7e6ffa45f3873716

> power outage,

Of course, the server is on an UPS.

And of course, some hours after I wrote the above, the UPS had an hickup
and the server crashed during the middle of the night.  We had to change
its batteries.

> It's another case of where we perform better in older kernels but we are 
> more correct in recent kernels... but our users don't appreciate the 
> correctness improvement :-)

The correctness improvement doesn't matter if the performance is so low
that you can't use it. I hope this has solved the problem, because I was
getting desperate. I had thought about abandoning NFS in linux and trying
openBSD, but so many things would be different (RAID, filesystems, backups)
that I really didn't want to.

BTW, is what I am doing rare ? I have about 50 linux computers (including
the mail server) mounting user areas from the NFS server. (Most of the
time only some of them are being used.) The users use mostly KDE (because
it has been the default option in SUSE for the past few years). This
sort of setup allows an user to login in any of the computers and to
have the same environment. So I would expect it to be widely used. But
when I asked about this in other places I never got a reply of the kind
"I am doing the same, and it works for me".



Thanks again.
-- 
rps

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
       [not found]             ` <20071127184050.GA13791-/uZbGGhPd8Gmv2Aub4FSgw@public.gmane.org>
@ 2007-11-28  1:50               ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2007-11-28  1:50 UTC (permalink / raw)
  To: Rui Pedro Mendes Salgueiro; +Cc: nfs

On Nov 27, 2007, at 1:40 PM, Rui Pedro Mendes Salgueiro wrote:
> Thanks for the replies, everyone. Last week I didn't had the time to
> send this mail. Some comments below:
>
> On Wed, Nov 21, 2007 at 11:04:55AM -0500, Chuck Lever wrote:
>> Wendy Cheng wrote:
>>> Peter Staubach wrote:
>>>> Wendy Cheng wrote:

>>>>> So go to your
>>>>> /etc/exports file and explicitly set the export option to  
>>>>> "async" to
>>>>> see whether you can get the performance back.
>
>>>> While this may or may not restore your performance aspects, it
>>>> is not safe to make this change.  The change was made for a
>>>> reason.
>
>> That means his old system would have been exposed to data corruption
>> issues if it crashes (panic,
>
> Luckly it has been reliable. Some years ago (a previous server)  
> crashed
> a lot but that was due to an obscure bug (XFS + SMP kernel + NFS =  
> crash,
> IIRC) which I don't know if it was ever fixed:
>
> http://groups.google.com/group/alt.os.linux.suse/browse_frm/thread/ 
> f24dd8f878bb3ea3/7e6ffa45f3873716?hl=en&lnk=st#7e6ffa45f3873716
>
>> power outage,
>
> Of course, the server is on an UPS.
>
> And of course, some hours after I wrote the above, the UPS had an  
> hickup
> and the server crashed during the middle of the night.  We had to  
> change
> its batteries.
>
>> It's another case of where we perform better in older kernels but  
>> we are
>> more correct in recent kernels... but our users don't appreciate the
>> correctness improvement :-)
>
> The correctness improvement doesn't matter if the performance is so  
> low
> that you can't use it.

I was being a bit facetious.

We won't ever make something "so correct it performs terribly" (on  
purpose, anyway).  There have been many cases where performance  
regressed significantly in certain corner cases where we don't have  
adequate testing, however.

In this case, async behavior in the worst case was so egregious that  
it had to be changed.  For NFSv3, the use of UNSTABLE writes usually  
mitigates the performance lost by using the "sync" export option.

> BTW, is what I am doing rare ? I have about 50 linux computers  
> (including
> the mail server) mounting user areas from the NFS server. (Most of the
> time only some of them are being used.) The users use mostly KDE  
> (because
> it has been the default option in SUSE for the past few years). This
> sort of setup allows an user to login in any of the computers and to
> have the same environment. So I would expect it to be widely used. But
> when I asked about this in other places I never got a reply of the  
> kind
> "I am doing the same, and it works for me".

50 clients shouldn't be a strain for the protocol itself.  However,  
your server may be just powerful enough for the old load, but the  
extra file activity during KDE login was just enough to push it over  
the edge to become unusable.

Further analysis of exactly how the clients are now behaving during  
login might be helpful in diagnosing how the server needs to change  
to handle the new load.  The fact that switching to "async" made a  
difference suggests that the new KDE login process adds a healthy  
write workload (as opposed to adding more READs or GETATTRs).  That  
is a helpful clue!

RAID 5 and NFS are particularly finicky together.  Lots of small  
random access writes, for instance, will quickly drive RAID 5 into  
the ground.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
       [not found]                 ` <18254.19187.470275.538680-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2007-11-29  5:30                   ` Trond Myklebust
       [not found]                     ` <1196314230.7950.42.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2007-11-29  5:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, chuck.lever, nfs, Wendy Cheng


On Thu, 2007-11-29 at 16:15 +1100, NeilBrown wrote:
> On Monday November 26, bfields@fieldses.org wrote:
> > 
> > (Stupid question: what would it take to give NFS the equivalent to
> > COMMIT for directory operations?)
> 
> Interesting question.
> 
> I guess the granularity would have to be per-directory.  I think
> programs that need transactional behaviour for directory operations
> already need to call fsync on the directory, so that should be
> consistent with the current API.
> 
> If DIR_COMMIT says "sorry, the server crashed", you need to replay the
> directory operations, which might be tricky in a number of cases.
> e.g. how do you replay a CREATE and be sure of getting the same
> fileid?
> How can you replay an UNLINK and be sure you deleted the right file
> and not some other file that some other client created since your
> UNLINK.
> 
> It would probably be possible to manage something, but I don't think
> it would be as "simple" as COMMIT.

Actually, the real problem would be dealing with something like
unlink('foo') followed by open('foo', O_CREAT|O_EXCL). How do you ensure
that a replay of those actions following a reboot is fully consistent in
the face of some other client attempting an open('foo', O_CREAT) at the
same time?

The problem is that a number of directory operations involve exclusive
semantics, and so cannot be replayed. The solution to this sort of
problem is going to have to involve exclusive (i.e. write) directory
delegations to ensure that whatever transactions one client performs
cannot interfere with the transactions performed by another.

Trond


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
       [not found]                     ` <1196314230.7950.42.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2007-11-30 16:27                       ` Wendy Cheng
  2007-12-03 20:31                         ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Wendy Cheng @ 2007-11-30 16:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: J. Bruce Fields, chuck.lever, nfs, NeilBrown

Trond Myklebust wrote:
> Actually, the real problem would be dealing with something like
> unlink('foo') followed by open('foo', O_CREAT|O_EXCL). How do you ensure
> that a replay of those actions following a reboot is fully consistent in
> the face of some other client attempting an open('foo', O_CREAT) at the
> same time?
>
> The problem is that a number of directory operations involve exclusive
> semantics, and so cannot be replayed. The solution to this sort of
> problem is going to have to involve exclusive (i.e. write) directory
> delegations to ensure that whatever transactions one client performs
> cannot interfere with the transactions performed by another.
>
>   

Well, a dumb question from me (borrowing Bruce's line :) ) ... even with 
"sync" in place, when server rebooted, the RPC reply cache is gone. How 
does linux server handle re-transmitted non-idempotent requests ?

-- Wendy



-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-11-30 16:27                       ` Wendy Cheng
@ 2007-12-03 20:31                         ` J. Bruce Fields
  2007-12-03 21:13                           ` Wendy Cheng
  0 siblings, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2007-12-03 20:31 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: chuck.lever, nfs, NeilBrown, Trond Myklebust

On Fri, Nov 30, 2007 at 11:27:16AM -0500, Wendy Cheng wrote:
> Trond Myklebust wrote:
>> Actually, the real problem would be dealing with something like
>> unlink('foo') followed by open('foo', O_CREAT|O_EXCL). How do you ensure
>> that a replay of those actions following a reboot is fully consistent in
>> the face of some other client attempting an open('foo', O_CREAT) at the
>> same time?
>>
>> The problem is that a number of directory operations involve exclusive
>> semantics, and so cannot be replayed. The solution to this sort of
>> problem is going to have to involve exclusive (i.e. write) directory
>> delegations to ensure that whatever transactions one client performs
>> cannot interfere with the transactions performed by another.
>>
>>   
>
> Well, a dumb question from me (borrowing Bruce's line :) ) ... even with 
> "sync" in place, when server rebooted, the RPC reply cache is gone. How 
> does linux server handle re-transmitted non-idempotent requests ?

Badly!

Somebody should figure out whether it would be possible for us to
implement persistent sessions in v4.1:

	http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.html#Persistence

It looks hard!

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-12-03 20:31                         ` J. Bruce Fields
@ 2007-12-03 21:13                           ` Wendy Cheng
  2007-12-03 21:30                             ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Wendy Cheng @ 2007-12-03 21:13 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: chuck.lever, nfs, NeilBrown, Trond Myklebust

J. Bruce Fields wrote:
> On Fri, Nov 30, 2007 at 11:27:16AM -0500, Wendy Cheng wrote:
>   
>> Well, a dumb question from me (borrowing Bruce's line :) ) ... even with 
>> "sync" in place, when server rebooted, the RPC reply cache is gone. How 
>> does linux server handle re-transmitted non-idempotent requests ?
>>     
>
> Badly!
>
> Somebody should figure out whether it would be possible for us to
> implement persistent sessions in v4.1:
>
> 	http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.html#Persistence
>
> It looks hard!
>   
Or use cluster (a backup server is quite affordable nowadays) ? Was 
about to kick off a new discussion about this ...

I did a prototype about 4 years ago on 2.4 kernel where the RPC reply 
cache (slightly modified to include raw NFS request packets) was 
mirrored by backup server (in memory). The reply was delayed to go back 
to client until the mirrored reply cache entry was acknowledged by the 
backup server. Upon crash, the backup server piggybacked its logic on 
ext3's journal recovery code. For reply cache entries not replayed or 
not recognized by jbd, nfsd resent the NFS raw requests down to 
filesystem just like any new arrived requested. The prototype code was 
able to gain at least 70% of the async mode performance without losing 
the data.

One of other issues with our current linux-based NFS cluster failover is 
also right in this arena - that is, upon failover, the non-idempotent 
could introduce stale filehandle errors that have been causing headaches 
with some of the applications. So mirroring RPC reply cache (to another 
machine) seems to be attractive.

Any comment ? Mind I write this up and send out for discussion ?

-- Wendy

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-12-03 21:13                           ` Wendy Cheng
@ 2007-12-03 21:30                             ` J. Bruce Fields
  2007-12-03 21:38                               ` J. Bruce Fields
  2007-12-03 21:49                               ` Wendy Cheng
  0 siblings, 2 replies; 17+ messages in thread
From: J. Bruce Fields @ 2007-12-03 21:30 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: chuck.lever, nfs, NeilBrown, Trond Myklebust

On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
> Or use cluster (a backup server is quite affordable nowadays) ? Was about 
> to kick off a new discussion about this ...
>
> I did a prototype about 4 years ago on 2.4 kernel where the RPC reply cache 
> (slightly modified to include raw NFS request packets) was mirrored by 
> backup server (in memory). The reply was delayed to go back to client until 
> the mirrored reply cache entry was acknowledged by the backup server. Upon 
> crash, the backup server piggybacked its logic on ext3's journal recovery 
> code. For reply cache entries not replayed or not recognized by jbd, nfsd 
> resent the NFS raw requests down to filesystem just like any new arrived 
> requested. The prototype code was able to gain at least 70% of the async 
> mode performance without losing the data.
>
> One of other issues with our current linux-based NFS cluster failover is 
> also right in this arena - that is, upon failover, the non-idempotent could 
> introduce stale filehandle errors that have been causing headaches with 
> some of the applications.

How exactly do the stale filehandles happen?

> So mirroring RPC reply cache (to another machine) 
> seems to be attractive.
>
> Any comment ? Mind I write this up and send out for discussion ?

I'd be interested?

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-12-03 21:30                             ` J. Bruce Fields
@ 2007-12-03 21:38                               ` J. Bruce Fields
  2007-12-03 21:49                               ` Wendy Cheng
  1 sibling, 0 replies; 17+ messages in thread
From: J. Bruce Fields @ 2007-12-03 21:38 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: chuck.lever, nfs, NeilBrown, Trond Myklebust

On Mon, Dec 03, 2007 at 04:30:04PM -0500, J. Bruce Fields wrote:
> On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
> > Or use cluster (a backup server is quite affordable nowadays) ? Was about 
> > to kick off a new discussion about this ...
> >
> > I did a prototype about 4 years ago on 2.4 kernel where the RPC reply cache 
> > (slightly modified to include raw NFS request packets) was mirrored by 
> > backup server (in memory). The reply was delayed to go back to client until 
> > the mirrored reply cache entry was acknowledged by the backup server. Upon 
> > crash, the backup server piggybacked its logic on ext3's journal recovery 
> > code. For reply cache entries not replayed or not recognized by jbd, nfsd 
> > resent the NFS raw requests down to filesystem just like any new arrived 
> > requested. The prototype code was able to gain at least 70% of the async 
> > mode performance without losing the data.
> >
> > One of other issues with our current linux-based NFS cluster failover is 
> > also right in this arena - that is, upon failover, the non-idempotent could 
> > introduce stale filehandle errors that have been causing headaches with 
> > some of the applications.
> 
> How exactly do the stale filehandles happen?
> 
> > So mirroring RPC reply cache (to another machine) 
> > seems to be attractive.
> >
> > Any comment ? Mind I write this up and send out for discussion ?
> 
> I'd be interested?

Um.  That was meant to be an !, not a ?.

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-12-03 21:30                             ` J. Bruce Fields
  2007-12-03 21:38                               ` J. Bruce Fields
@ 2007-12-03 21:49                               ` Wendy Cheng
  2007-12-03 22:07                                 ` J. Bruce Fields
  1 sibling, 1 reply; 17+ messages in thread
From: Wendy Cheng @ 2007-12-03 21:49 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: chuck.lever, nfs, NeilBrown, Trond Myklebust

J. Bruce Fields wrote:
> On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
>   
>> Or use cluster (a backup server is quite affordable nowadays) ? Was about 
>> to kick off a new discussion about this ...
>>
>> I did a prototype about 4 years ago on 2.4 kernel where the RPC reply cache 
>> (slightly modified to include raw NFS request packets) was mirrored by 
>> backup server (in memory). The reply was delayed to go back to client until 
>> the mirrored reply cache entry was acknowledged by the backup server. Upon 
>> crash, the backup server piggybacked its logic on ext3's journal recovery 
>> code. For reply cache entries not replayed or not recognized by jbd, nfsd 
>> resent the NFS raw requests down to filesystem just like any new arrived 
>> requested. The prototype code was able to gain at least 70% of the async 
>> mode performance without losing the data.
>>
>> One of other issues with our current linux-based NFS cluster failover is 
>> also right in this arena - that is, upon failover, the non-idempotent could 
>> introduce stale filehandle errors that have been causing headaches with 
>> some of the applications.
>>     
>
> How exactly do the stale filehandles happen?
>   

Unless someone has fixed it .. last time we looked .. one of the causes 
was like this:

A "delete" was successfully executed on one server but before replying 
to client, failover occurred. The retransmitted request was sent to 
take-over server that subsequently couldn't find the file (since the 
file had gone). A stale filehandle (or maybe an EACCESS or EPERM, forgot 
the details though) was returned.

-- Wendy


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems)
  2007-12-03 21:49                               ` Wendy Cheng
@ 2007-12-03 22:07                                 ` J. Bruce Fields
  0 siblings, 0 replies; 17+ messages in thread
From: J. Bruce Fields @ 2007-12-03 22:07 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: chuck.lever, nfs, NeilBrown, Trond Myklebust

On Mon, Dec 03, 2007 at 04:49:42PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> On Mon, Dec 03, 2007 at 04:13:02PM -0500, Wendy Cheng wrote:
>>   
>>> Or use cluster (a backup server is quite affordable nowadays) ? Was about 
>>> to kick off a new discussion about this ...
>>>
>>> I did a prototype about 4 years ago on 2.4 kernel where the RPC reply 
>>> cache (slightly modified to include raw NFS request packets) was mirrored 
>>> by backup server (in memory). The reply was delayed to go back to client 
>>> until the mirrored reply cache entry was acknowledged by the backup 
>>> server. Upon crash, the backup server piggybacked its logic on ext3's 
>>> journal recovery code. For reply cache entries not replayed or not 
>>> recognized by jbd, nfsd resent the NFS raw requests down to filesystem 
>>> just like any new arrived requested. The prototype code was able to gain 
>>> at least 70% of the async mode performance without losing the data.
>>>
>>> One of other issues with our current linux-based NFS cluster failover is 
>>> also right in this arena - that is, upon failover, the non-idempotent 
>>> could introduce stale filehandle errors that have been causing headaches 
>>> with some of the applications.
>>>     
>>
>> How exactly do the stale filehandles happen?
>>   
>
> Unless someone has fixed it .. last time we looked .. one of the causes was 
> like this:
>
> A "delete" was successfully executed on one server but before replying to 
> client, failover occurred. The retransmitted request was sent to take-over 
> server that subsequently couldn't find the file (since the file had gone). 
> A stale filehandle (or maybe an EACCESS or EPERM, forgot the details 
> though) was returned.

OK, makes sense.  But the REMOVE operation takes a filehandle (for a
parent) and a name (the name of the thing to remove), so if it's already
been removed you'd expect something like ENOENT.

--b.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-12-03 22:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.194659.1195581701.26582.nfs@lists.sourceforge.net>
2007-11-20 20:22 ` [NFS] NFS Digest, Vol 18, Issue 70 (NFS performance problems) Wendy Cheng
     [not found]   ` <BAY104-W43766F661E50E7FFCDA8EBA7F0-MsuGFMq8XAE@public.gmane.org>
2007-11-20 21:17     ` Peter Staubach
2007-11-20 21:23       ` Wendy Cheng
2007-11-21 16:04         ` Chuck Lever
2007-11-26  3:28           ` Wendy Cheng
2007-11-26  4:41             ` Trond Myklebust
2007-11-26  5:02             ` J. Bruce Fields
     [not found]               ` <18254.19187.470275.538680@notabene.brown>
     [not found]                 ` <18254.19187.470275.538680-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2007-11-29  5:30                   ` Trond Myklebust
     [not found]                     ` <1196314230.7950.42.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2007-11-30 16:27                       ` Wendy Cheng
2007-12-03 20:31                         ` J. Bruce Fields
2007-12-03 21:13                           ` Wendy Cheng
2007-12-03 21:30                             ` J. Bruce Fields
2007-12-03 21:38                               ` J. Bruce Fields
2007-12-03 21:49                               ` Wendy Cheng
2007-12-03 22:07                                 ` J. Bruce Fields
2007-11-27 18:40           ` Rui Pedro Mendes Salgueiro
     [not found]             ` <20071127184050.GA13791-/uZbGGhPd8Gmv2Aub4FSgw@public.gmane.org>
2007-11-28  1:50               ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.