public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Whoop Whouzer <tiredandnumb@gmail.com>
Cc: "Muntz, Daniel" <Dan.Muntz@netapp.com>,
	Peter Chacko <peterchacko35@gmail.com>,
	linux-nfs@vger.kernel.org
Subject: Re: nfs client performance while server is down
Date: Mon, 25 Jan 2010 16:26:09 -0500	[thread overview]
Message-ID: <699604E4-B075-4E94-806F-CA06BCE6E9DA@oracle.com> (raw)
In-Reply-To: <d7f0b3a81001251318k42de9be2qe54f83bbd86cabb8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Jan 25, 2010, at 4:18 PM, Whoop Whouzer wrote:
> Any idea how I could do that?

The strace(1) man page says "-f" follows children.  In any event, you  
can strace the running children processes using "strace -p <pid>".

> On Mon, Jan 25, 2010 at 10:01 PM, Chuck Lever  
> <chuck.lever@oracle.com> wrote:
>> On Jan 25, 2010, at 2:38 PM, Whoop Whouzer wrote:
>>>
>>> Running  "strace nautilus" gives me allot of output. When I run it
>>> while the server is down it completes the trace without a hiccup, it
>>> returns and than nautilus is launched and hangs.
>>> There are differences between the traces (with server up and server
>>> down). I can't really see where the problem lies in there.
>>
>> I would expect that the command-line nautilus forks when it starts  
>> up.  If
>> it has some option you can specify to prevent that, it might allow  
>> a deeper
>> look.  You would need to tell strace to look at the children, too.
>>
>>> On Mon, Jan 25, 2010 at 8:08 PM, Chuck Lever  
>>> <chuck.lever@oracle.com>
>>> wrote:
>>>>
>>>> On Jan 25, 2010, at 2:02 PM, Whoop Whouzer wrote:
>>>>>
>>>>> Ok, I did that, after shutting down the server and enabling debug
>>>>> trace I tried to open the home folder of the current account  
>>>>> (totally
>>>>> unrelated to the nfsshare), it wouldn't open at all, I got no  
>>>>> nautilus
>>>>> at all. During the time my cursor was in busy mode I got the  
>>>>> following
>>>>> messages in kern.log (for ubuntu 10.04 client):
>>>>> Jan 25 19:30:13 whoop-desktop kernel: [  160.719262] NFS call   
>>>>> fsstat
>>>>> Jan 25 19:30:37 whoop-desktop kernel: [  184.458611] NFS:
>>>>> permission(0:16/74386), mask=0x10, res=0
>>>>> Jan 25 19:30:37 whoop-desktop kernel: [  184.458647] NFS call   
>>>>> access
>>>>> Jan 25 19:30:43 whoop-desktop kernel: [  190.721086] nfs: server
>>>>> 192.168.1.130 not responding, timed out
>>>>> Jan 25 19:30:43 whoop-desktop kernel: [  190.721113] NFS reply  
>>>>> statfs:
>>>>> -5
>>>>> Jan 25 19:30:43 whoop-desktop kernel: [  190.721116] nfs_statfs:
>>>>> statfs error = 5
>>>>> These series of traces are repeating over and over again at a set
>>>>> interval (there is no flooding of the logs), even if I do nothing.
>>>>> It's even worse than I thought because when I tried to shutdown,  
>>>>> the
>>>>> machine wouldn't shutdown because it claimed
>>>>> the "File manager" was still running (although it was not  
>>>>> visible on
>>>>> screen); so I had to kill that before I could shutdown (properly).
>>>>>
>>>>> In Fedora 12 I had a similar user experience (nautilus did show up
>>>>> without showing any contents and it was hanging). I had enabled
>>>>> tracing and it seems to be logged to /var/log/messages. I got this
>>>>> output in fedora:
>>>>> Jan 25 20:48:38 localhost kernel: NFS reply statfs: -5
>>>>> Jan 25 20:48:38 localhost kernel: nfs_statfs: statfs error = 5
>>>>> Jan 25 20:48:38 localhost kernel: NFS call  fsstat
>>>>> Jan 25 20:49:14 localhost kernel: nfs: server 192.168.1.130 not
>>>>> responding, timed out
>>>>> Jan 25 20:49:14 localhost kernel: NFS reply getattr: -5
>>>>> Jan 25 20:49:14 localhost kernel: nfs_revalidate_inode:  
>>>>> (0:14/74386)
>>>>> getattr failed, error=-5
>>>>> Jan 25 20:49:25 localhost kernel: NFS: revalidating (0:14/74386)
>>>>> Jan 25 20:49:25 localhost kernel: NFS call  getattr
>>>>> Jan 25 20:50:14 localhost kernel: nfs: server 192.168.1.130 not
>>>>> responding, timed out
>>>>> Jan 25 20:50:14 localhost kernel: NFS reply access: -5
>>>>> Jan 25 20:50:14 localhost kernel: NFS: permission(0:14/74386),  
>>>>> mask=0x1,
>>>>> res=-5
>>>>> Jan 25 20:50:14 localhost kernel: NFS call  access
>>>>> Jan 25 20:51:14 localhost kernel: nfs: server 192.168.1.130 not
>>>>> responding, timed out
>>>>> Jan 25 20:51:14 localhost kernel: NFS reply statfs: -5
>>>>> Jan 25 20:51:14 localhost kernel: nfs_statfs: statfs error = 5
>>>>> Jan 25 20:51:14 localhost kernel: NFS call  fsstat
>>>>> Most of the trace is repeating in set intervals as well, there  
>>>>> is no
>>>>> flooding of the logs...
>>>>> Fedora would not shutdown normally either
>>>>
>>>> This verifies that your client is attempting to access the NFS  
>>>> server,
>>>> but
>>>> doesn't tell us which file it's attempting to access.   
>>>> Essentially the
>>>> EIO
>>>> means "failed to connect".
>>>>
>>>> Maybe try an strace of the nautilus process next?
>>>>
>>>>> On Mon, Jan 25, 2010 at 5:48 PM, Chuck Lever <chuck.lever@oracle.com 
>>>>> >
>>>>> wrote:
>>>>>>
>>>>>> On Jan 24, 2010, at 7:09 PM, Whoop Whouzer wrote:
>>>>>>>
>>>>>>> I did some network traces and there is nothing strange  
>>>>>>> happening as
>>>>>>> far as I can tell. I shut down the server (some network traffic
>>>>>>> occurred as is to be expected). It got quiet again, I launched
>>>>>>> nautilus, it got stuck without displaying anything and there  
>>>>>>> was no
>>>>>>> real network activity except 3 broadcasts using the ARP protocol
>>>>>>> asking where the server was (could be just coincidence).
>>>>>>
>>>>>> That sounds like the client does want to reconnect with the  
>>>>>> server.
>>>>>>
>>>>>> You could try enabling debug tracing on your client (sudo  
>>>>>> rpcdebug -m
>>>>>> nfs
>>>>>> -s
>>>>>> all) after shutting down your server, then try to start  
>>>>>> nautilus.  The
>>>>>> kernel log would then contain NFS-related messages that might  
>>>>>> indicate
>>>>>> where
>>>>>> to look next.
>>>>>>
>>>>>>> Closing
>>>>>>> nautilus and launching it again will let it hang again but I  
>>>>>>> see no
>>>>>>> additional network traffic. After a while nautilus will  
>>>>>>> display the
>>>>>>> contents of the folder without any network traffic.
>>>>>>>
>>>>>>> On Sun, Jan 24, 2010 at 10:34 PM, Muntz, Daniel <Dan.Muntz@netapp.com 
>>>>>>> >
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Perhaps something in your $PATH is in the NFS mount?  Do a  
>>>>>>>> network
>>>>>>>> trace
>>>>>>>> and maybe you can see if, in fact, there are actually NFS  
>>>>>>>> operations
>>>>>>>> being
>>>>>>>> attempted that you weren't expecting.  Then try to figure out  
>>>>>>>> why.
>>>>>>>>
>>>>>>>>  -Dan
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Whoop Whouzer [mailto:tiredandnumb@gmail.com]
>>>>>>>>> Sent: Saturday, January 23, 2010 8:28 AM
>>>>>>>>> To: Peter Chacko
>>>>>>>>> Cc: linux-nfs@vger.kernel.org
>>>>>>>>> Subject: Re: nfs client performance while server is down
>>>>>>>>>
>>>>>>>>> I don't remember all the different set-ups I tried it on,  
>>>>>>>>> but I just
>>>>>>>>> confirmed this with the following combinations:
>>>>>>>>>
>>>>>>>>> ubuntu server 10.04 (alpha 2) --> ubuntu desktop 9.10, ubuntu
>>>>>>>>> desktop
>>>>>>>>> 10.04 (alpha 2), fedora 12
>>>>>>>>> ubuntu server 9.10 --> ubuntu desktop 9.10, ubuntu desktop  
>>>>>>>>> 10.04
>>>>>>>>> (alpha 2), fedora 12
>>>>>>>>>
>>>>>>>>> I'll be happy to test it on another client machine (distro)  
>>>>>>>>> even
>>>>>>>>> another server (although it would require a little more time)
>>>>>>>>>
>>>>>>>>> Here are some examples on the bugreports I noticed and how  
>>>>>>>>> they do
>>>>>>>>> not
>>>>>>>>> seem to get solved:
>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=175283
>>>>>>>>> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/ 
>>>>>>>>> 164120
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Whoop
>>>>>>>>>
>>>>>>>>> On Sat, Jan 23, 2010 at 4:57 PM, Peter Chacko
>>>>>>>>> <peterchacko35@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Which client OS you observed this behavior ?  This has  
>>>>>>>>>> nothing to
>>>>>>>>>> do
>>>>>>>>>> NFS design, and its purely stateless...Its upto the client OS
>>>>>>>>>> implementation about aspects like how to deal with local
>>>>>>>>>
>>>>>>>>> IO, when NFS
>>>>>>>>>>
>>>>>>>>>> share gets  disconnected..
>>>>>>>>>>
>>>>>>>>>> May be a VFS bug on the local OS you found this problem ..
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>>
>>>>>>>>>> On Sat, Jan 23, 2010 at 9:15 PM, Whoop Whouzer
>>>>>>>>>
>>>>>>>>> <tiredandnumb@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Howdy,
>>>>>>>>>>>
>>>>>>>>>>> I was wondering why nfs is designed in such a way that the
>>>>>>>>>
>>>>>>>>> performance
>>>>>>>>>>>
>>>>>>>>>>> of an nfs client machine gets very bad when the nfs server
>>>>>>>>>
>>>>>>>>> is offline?
>>>>>>>>>>>
>>>>>>>>>>> This is even the case with a soft mount (either via mount
>>>>>>>>>
>>>>>>>>> or fstab).
>>>>>>>>>>>
>>>>>>>>>>> Just about every application that requires disk access (not
>>>>>>>>>>> talking
>>>>>>>>>>> about nfs share acces) gets really slow to unresponsive.
>>>>>>>>>
>>>>>>>>> For instance
>>>>>>>>>>>
>>>>>>>>>>> nautilus becomes unresponsive when displaying the contents  
>>>>>>>>>>> of any
>>>>>>>>>>> folder on the local disk,
>>>>>>>>>>> playing movie files (stored on local disk) let totem or
>>>>>>>>>
>>>>>>>>> vlc get stuck
>>>>>>>>>>>
>>>>>>>>>>> on set intervals, even the terminal becomes unresponsive  
>>>>>>>>>>> at times.
>>>>>>>>>>>
>>>>>>>>>>> I could understand that these problems would occur while
>>>>>>>>>
>>>>>>>>> accessing the
>>>>>>>>>>>
>>>>>>>>>>> nfs share directoiourry while the server is offline, but
>>>>>>>>>
>>>>>>>>> why for totally
>>>>>>>>>>>
>>>>>>>>>>> unrelated directories?
>>>>>>>>>>>
>>>>>>>>>>> I have experienced this behaviour on various distro's, and
>>>>>>>>>
>>>>>>>>> also found
>>>>>>>>>>>
>>>>>>>>>>> various bug reports on this issue, they don't seem to get  
>>>>>>>>>>> solved
>>>>>>>>>>> as
>>>>>>>>>>> this is viewed as nfs design.
>>>>>>>>>>> I see this as a flaw because clients are totally dependent  
>>>>>>>>>>> on the
>>>>>>>>>>> server. This would be less of a deal if the entire home  
>>>>>>>>>>> directory
>>>>>>>>>>> would be stored on nfs (although I even think some sort of
>>>>>>>>>>> synchronisation technology could and should be implemented  
>>>>>>>>>>> in this
>>>>>>>>>>> case). It is a bit odd that (technically) one machine  
>>>>>>>>>>> serving some
>>>>>>>>>>> "useless" files to a non-trivial directory on client
>>>>>>>>>
>>>>>>>>> machines can take
>>>>>>>>>>>
>>>>>>>>>>> down these client machines.
>>>>>>>>>>>
>>>>>>>>>>> For me the preferred functionality would be:
>>>>>>>>>>> *If an nfs server gets offline the client's nfs share  
>>>>>>>>>>> becomes
>>>>>>>>>>> unaccessible, but local directories and applications (that  
>>>>>>>>>>> only
>>>>>>>>>>> require local disk access) stay responsive.
>>>>>>>>>>> *If an nfs server gets online (after being offline while the
>>>>>>>>>>> client
>>>>>>>>>>> has not been restarted) the nfs share becomes reconnected.
>>>>>>>>>>>
>>>>>>>>>>> regards,
>>>>>>>>>>> Whoop
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>
>>>>>>>>> linux-nfs" in
>>>>>>>>>>>
>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-nfs" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe  
>>>>>>> linux-nfs"
>>>>>>> in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>>
>>>>
>>>>
>>> <stracesdiff.log>
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




  parent reply	other threads:[~2010-01-25 21:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-23 15:45 nfs client performance while server is down Whoop Whouzer
     [not found] ` <d7f0b3a81001230745h18dbb14fi42f28adff0c45294-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-23 15:57   ` Peter Chacko
2010-01-23 16:27     ` Whoop Whouzer
     [not found]       ` <d7f0b3a81001230827y52727993nf60210ae610643b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-24 21:34         ` Muntz, Daniel
     [not found]           ` <7A24DF798E223B4C9864E8F92E8C93EC0527810C-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2010-01-24 22:03             ` Whoop Whouzer
2010-01-25  0:09             ` Whoop Whouzer
2010-01-25 16:48               ` Chuck Lever
2010-01-25 19:02                 ` Whoop Whouzer
     [not found]                   ` <d7f0b3a81001251102p5e631706jfd9f147a00487061-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-25 19:08                     ` Chuck Lever
     [not found]                       ` <d7f0b3a81001251138h30e25428o25db9bc8c0884636@mail.gmail.com>
     [not found]                         ` <d7f0b3a81001251138h30e25428o25db9bc8c0884636-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-25 19:48                           ` Whoop Whouzer
2010-01-25 21:01                           ` Chuck Lever
2010-01-25 21:18                             ` Whoop Whouzer
     [not found]                               ` <d7f0b3a81001251318k42de9be2qe54f83bbd86cabb8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-25 21:26                                 ` Chuck Lever [this message]
2010-01-25 23:03                                   ` Whoop Whouzer
2010-01-26 23:21                       ` J. Bruce Fields
2010-01-27  0:40                         ` Whoop Whouzer
2010-01-27 17:10                           ` J. Bruce Fields
2010-01-27 18:23                         ` Chuck Lever
2010-01-27 18:40                           ` Trond Myklebust
2010-01-27 18:47                             ` Whoop Whouzer
2010-01-27 19:09                               ` Trond Myklebust
2010-01-27 19:25                                 ` Whoop Whouzer
2010-01-27 19:30                                   ` Ray Van Dolson
2010-01-27 19:31                                   ` Peter Staubach

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=699604E4-B075-4E94-806F-CA06BCE6E9DA@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Dan.Muntz@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=peterchacko35@gmail.com \
    --cc=tiredandnumb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox