linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Ulrich Gemkow <ulrich.gemkow@ikr.uni-stuttgart.de>,
	linux-nfs@vger.kernel.org
Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of client
Date: Mon, 31 Aug 2015 17:52:37 +0200 (CEST)	[thread overview]
Message-ID: <1738066646.4791126.1441036357343.JavaMail.zimbra@desy.de> (raw)
In-Reply-To: <20150831145148.GB17812@fieldses.org>



----- Original Message -----
> From: "J. Bruce Fields" <bfields@fieldses.org>
> To: "Ulrich Gemkow" <ulrich.gemkow@ikr.uni-stuttgart.de>
> Cc: linux-nfs@vger.kernel.org
> Sent: Monday, August 31, 2015 4:51:48 PM
> Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of client

> On Mon, Aug 31, 2015 at 02:08:08PM +0200, Ulrich Gemkow wrote:
>> Hallo Bruce,
>> 
>> On Wednesday 26 August 2015 22:09:40 you wrote:
>> > On Wed, Aug 26, 2015 at 09:54:22PM +0200, Ulrich Gemkow wrote:
>> > > Hello Bruce,
>> > > 
>> > > On Tuesday 25 August 2015 23:54:56 J. Bruce Fields wrote:
>> > > > The SERVERFAULT is on SETCLIENTID_CONFIRM.
>> > > > 
>> > > > In nfsd4_setclientid_confirm():
>> > > > 
>> > > > 	conf = find_confirmed_client(clid, false, nn);
>> > > > 	unconf = find_unconfirmed_client(clid, false, nn);
>> > > > 	/*
>> > > >          * We try hard to give out unique clientid's, so if we get an
>> > > >          * attempt to confirm the same clientid with a different cred,
>> > > >          * there's a bug somewhere.  Let's charitably assume it's our
>> > > >          * bug.
>> > > >          */
>> > > >         status = nfserr_serverfault;
>> > > >         if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred))
>> > > >                 goto out;
>> > > >         if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred))
>> > > >                 goto out;
>> > > > 
>> > > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical
>> > > > auth_unix creds.
>> > > > 
>> > > > The clientid that were looking up there was returned from the previous
>> > > > SETCLIENTID, generated by this logic:
>> > > > 
>> > > > 	if (conf && same_verf(&conf->cl_verifier, &clverifier))
>> > > >                 /* case 1: probable callback update */
>> > > >                 copy_clid(new, conf);
>> > > >         else /* case 4 (new client) or cases 2, 3 (client reboot): */
>> > > >                 gen_clid(new, nn);
>> > > > 
>> > > > So it should be a brand new clientid, unless the client was reusing the old
>> > > > verifier.
>> > > > 
>> > > > So perhaps the client is sending the SETCLIENTID with a verifier set to what it
>> > > > used on the previous boot?  That sounds like a client bug.  The linux
>> > > > client uses a timestamp for the verifier, looks like the Solaris client
>> > > > might too.  Is there some reason the clock on this client isn't
>> > > > advancing on reboot?
>> > > 
>> > > Thank you for the analysis. But the clock of the client advances
>> > > regularely and as one would expect.
>> > 
>> > OK, thanks for checking that.
>> > 
>> > > The client is SPARC Solaris 10 with the latest patches
>> > > applied - I cannot believe that this client has such a
>> > > basic NFS bug.
>> > 
>> > To confirm or deny my hypothesis, I think what we want is a longer
>> > capture that gets the failing SETCLIENTID_CONFIRM (as seen in the
>> > previous capture) but also shows what clientid the client was using
>> > before the reboot.  So ideal might be something like:
>> > 
>> > 	- start the capture
>> > 	- mount
>> > 	- create a file (I just want to make sure the client does at
>> > 	  least one open)
>> > 	- reboot the client
>> > 	- mount again, see the failure
>> > 	- stop the capture
>> 
>> I tried but probably made a mistake: To be sure to have a
>> defined state for the test I rebooted the server while clearing
>> all its NFS state and I reinstalled the client - both with the
>> exact same configuration as before.
>> 
>> And now the bug unfortunately does not happen again, the mount
>> always succeeds. I did the reinstall of the client also before
>> my first mail to be sure so it seems that the server may have
>> reached an invalid state before - whatever this may has caused.
> 
> That's interesting!
> 
>> I can only wait until the bug happens again (hoping not :-).
>> 
>> Maybe you are able to find a reason from the information
>> given before. I regret to be of no more help. If I can do
>> something please tell me.
> 
> I'm not coming up with any ideas right now.  Do let us know if you get
> into that state again.


To me it sounds like server still has a reference by client's  ownerid and fails
to detect that verifier is not valid any more. Some kind of leak in client
disconnect/reboot detection code (although code looks like doing the right thing).
I don't have much inside Linux server to verify that.

Tigran.


> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-08-31 15:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-24 12:52 NFSv4 mount fails on Sun Solaris 10 after reboot of client Ulrich Gemkow
2015-08-24 20:14 ` J. Bruce Fields
2015-08-25 17:28   ` Ulrich Gemkow
2015-08-25 21:54     ` J. Bruce Fields
2015-08-26 19:54       ` Ulrich Gemkow
2015-08-26 20:09         ` J. Bruce Fields
2015-08-31 12:08           ` Ulrich Gemkow
2015-08-31 14:51             ` J. Bruce Fields
2015-08-31 15:52               ` Mkrtchyan, Tigran [this message]
2015-08-27  6:43       ` Mkrtchyan, Tigran
2015-08-27 18:29         ` J. Bruce Fields
2015-08-27 20:36           ` Frank Filz
2015-08-28 18:06             ` 'J. Bruce Fields'
2015-09-01 17:43               ` 'J. Bruce Fields'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1738066646.4791126.1441036357343.JavaMail.zimbra@desy.de \
    --to=tigran.mkrtchyan@desy.de \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=ulrich.gemkow@ikr.uni-stuttgart.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).