All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Edmon <harry@uw.edu>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>, linux-nfs@vger.kernel.org
Subject: Re: 2.6.38.6 - state manager constantly respawns
Date: Mon, 16 May 2011 13:37:23 -0700	[thread overview]
Message-ID: <4DD18B03.1050101@uw.edu> (raw)
In-Reply-To: <1305578007.19725.24.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>

On 05/16/11 13:33, Trond Myklebust wrote:
> On Mon, 2011-05-16 at 16:21 -0400, Chuck Lever wrote:
>    
>> On May 16, 2011, at 3:43 PM, Trond Myklebust wrote:
>>
>>      
>>> On Mon, 2011-05-16 at 12:36 -0700, Harry Edmon wrote:
>>>        
>>>> On 05/16/11 12:22, Chuck Lever wrote:
>>>>          
>>>>> On May 16, 2011, at 3:12 PM, Harry Edmon wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> Attached is 1000 lines of output from tshark when the problem is occurring.   The client and server are connected by a private ethernet.
>>>>>>
>>>>>>              
>>>>> Disappointing: tshark is not telling us the return codes.  However, I see "PUTFH;READ" then "RENEW" in a loop, which indicates the state manager thread is being kicked off because of ongoing difficulties with state recovery.  Is there a stuck application on that client?
>>>>>
>>>>> Try again with "tshark -V".
>>>>>
>>>>>            
>>>> Here is the output from tshark -V (first 50,000 lines).   Nothing
>>>> appears to be stuck, and as I said when I reboot the client into 2.6.32
>>>> the problem goes away, only to reappear when I reboot it back into 2.6.38.6.
>>>>
>>>>          
>>> Possibly, but it definitely indicates a server bug. What kind of server
>>> are you using?
>>>
>>> Basically, the client is getting confused because when it sends a READ,
>>> the server is telling it that the lease has expired, then when it sends
>>> a RENEW, the same server replies that the lease is OK...
>>>        
>> I've seen this during migration recovery testing.  The client may be testing the wrong client ID.
>>
>> But I wonder if it's really worth doing that separate RENEW.  I've seen the client send a RENEW after it gets STALE_STATEID.  Would RENEW really tell the client anything in that case?
>>      
> It is needed.
>
> Without the RENEW, we have no idea whether or not we need to do a full
> state recovery. Running a full recovery when we don't have to is _bad_,
> and will usually cause us to lose delegations and may possibly even
> cause us to lose locks.
>
>    
By the way, this is not the only client/server running 2.6.38 that I 
have this problem on.   It is occurring on other random ones I 
maintain.  This example is happens to be the cleanest one I have, this 
NFS server is only talking to this specific NFS client over a private 
network.
-- 

  Dr. Harry Edmon			E-MAIL: harry@uw.edu
  206-543-0547 FAX: 206-543-0308			harry-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org
  Director of IT, College of the Environment and
  Director of Computing, Dept of Atmospheric Sciences
  University of Washington, Box 351640, Seattle, WA 98195-1640


      parent reply	other threads:[~2011-05-16 20:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 18:40 2.6.38.6 - state manager constantly respawns Harry Edmon
2011-05-16 18:45 ` Chuck Lever
2011-05-16 19:12   ` Harry Edmon
2011-05-16 19:22     ` Chuck Lever
2011-05-16 19:36       ` Harry Edmon
2011-05-16 19:43         ` Trond Myklebust
2011-05-16 19:48           ` Harry Edmon
2011-05-16 19:54             ` Trond Myklebust
2011-05-16 20:20               ` Dr. J. Bruce Fields
2011-05-16 20:53                 ` Dr. J. Bruce Fields
2011-05-20 16:20                   ` Harry Edmon
2011-05-20 17:26                     ` Dr. J. Bruce Fields
2011-05-20 17:52                       ` Trond Myklebust
2011-05-20 18:36                         ` Trond Myklebust
2011-05-20 18:59                           ` Dr. J. Bruce Fields
2011-05-20 19:15                             ` Trond Myklebust
2011-05-20 19:32                               ` Dr. J. Bruce Fields
2011-05-20 18:47                         ` Dr. J. Bruce Fields
2011-05-20 18:50                           ` Bryan Schumaker
2011-05-20 19:29                         ` Harry Edmon
2011-05-20 19:39                           ` Andy Adamson
2011-05-20 19:40                           ` Trond Myklebust
2011-05-20 19:44                             ` Harry Edmon
2011-05-20 20:11                               ` Trond Myklebust
2011-05-20 20:23                                 ` Harry Edmon
2011-05-20 20:27                                   ` Trond Myklebust
2011-05-20 18:35                       ` Harry Edmon
2011-05-16 20:21           ` Chuck Lever
2011-05-16 20:33             ` Trond Myklebust
     [not found]               ` <1305578007.19725.24.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-05-16 20:37                 ` Harry Edmon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DD18B03.1050101@uw.edu \
    --to=harry@uw.edu \
    --cc=Trond.Myklebust@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.