All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joschi Brauchle <joschi.brauchle@tum.de>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS Client looping with STALE_STATE_ID error - 100% load on one CPU core
Date: Tue, 05 May 2015 20:43:29 +0200	[thread overview]
Message-ID: <55490F51.2080508@tum.de> (raw)
In-Reply-To: <alpine.OSX.2.19.9992.1505051301510.946@planck.local>

[-- Attachment #1: Type: text/plain, Size: 3199 bytes --]

Hello Ben,

thanks for the hint. I checked the kernel and it already includes this 
patch, but still gets stuck in this STALE_ID -> RENEW -> STALE_ID loop...

Best regards,
J Brauchle

On 05/05/2015 07:24 PM, Benjamin Coddington wrote:
> Hi J Brauchle,
>
> Are you producing this on a kernel that has "8faaa6d Fixing lease renewal"?
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8faaa6d5d48b2015
>
> If not, I think that might be what you need.
>
> Ben
>
> On Thu, 23 Apr 2015, Joschi Brauchle wrote:
>
>> Hello everyone,
>>
>> I am posting to this list as suggested by Neil F Brown in this OpenSUSE
>> bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=909634
>>
>>
>> We are mounting user homes and other shares using 'sec=krb' and NFSv4. With
>> kernel 3.16.7+ (includes latest NFS patches, please see BZ link for details)
>> we are seeing 100% CPU core usage by a kernel process on some machine
>> connected to a 'NetApp FAS 3140 with ONTAP 8.1.4P1 7-mode' NFSv3/4 Server. It
>> looks like the problem is triggered when the NFS server is unavailable for a
>> short period of time or was rebooted.
>>
>> Neil F Brown has come to the following conclusion so far:
>> ------------- QUOTE -------------
>> What I see happening here is that a WRITE or LOCK request gets
>> NFS4ERR_STALE_STATEID, and then the client sends a "RENEW" request.
>>
>> If the stateid is really stale, that should fail, but it doesn't. So the
>> client assumes everything is OK and keeps sending requests .... which fail
>> with NFS4ERR_STALE_STATEID.
>>
>> STALE_STATEID means:
>>
>>        A stateid generated by an earlier server instance was used.
>>
>> which means that the server must have restarted since that stateid was issued.
>>
>> So there seem to be two possible explanations:
>>
>> 1/ The server is lying.  The stateid isn't really 'stale'.  Maybe it is 'bad'.
>> 2/ The client has previously handled a STALE_STATEID error and created a new
>> "client" handle, but hasn't refreshed all of the stateids that it was holding.
>> So it is trying to use old stateid even though it has established a new
>> client.
>>
>> The client code looks OK, though it is sufficiently complex that I cannot be
>> certain.  There are no patches in more recent versions.
>> -------------  END  -------------
>>
>> We have some RPC/NFS debug logs available here:
>> https://bugzilla.novell.com/attachment.cgi?id=631962
>> And a wireshark trace here:
>> https://bugzilla.novell.com/attachment.cgi?id=632104
>>
>>
>> What I cannot do:
>> - Trigger or reproduce the problem reliably. It comes up every once in a
>> while. Around 2-3 times a month on a handful of ~50 machines.
>> - Provide logs from with the latest kernel (because I cannot trigger the
>> problem by any means. Updating all ~50 machines is not an option)
>>
>>
>> What I can do:
>> - Provide logs from or possibly even access to the machine in the 'broken'
>> state for the next couple of days. I will have to reboot it at some time to
>> take the load of the NFS server.
>>
>>
>>
>> I would be grateful for any help or comments.
>>
>> -J Brauchle


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4917 bytes --]

      reply	other threads:[~2015-05-05 18:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-23 10:16 NFS Client looping with STALE_STATE_ID error - 100% load on one CPU core Joschi Brauchle
2015-05-05 17:24 ` Benjamin Coddington
2015-05-05 18:43   ` Joschi Brauchle [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55490F51.2080508@tum.de \
    --to=joschi.brauchle@tum.de \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.