linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'J. Bruce Fields'" <bfields@fieldses.org>,
	"'Mkrtchyan, Tigran'" <tigran.mkrtchyan@desy.de>
Cc: "'Ulrich Gemkow'" <ulrich.gemkow@ikr.uni-stuttgart.de>,
	<linux-nfs@vger.kernel.org>
Subject: RE: NFSv4 mount fails on Sun Solaris 10 after reboot of client
Date: Thu, 27 Aug 2015 13:36:38 -0700	[thread overview]
Message-ID: <008901d0e108$13caa520$3b5fef60$@mindspring.com> (raw)
In-Reply-To: <20150827182922.GB11819@fieldses.org>

> On Thu, Aug 27, 2015 at 08:43:51AM +0200, Mkrtchyan, Tigran wrote:
> >
> >
> > ----- Original Message -----
> > > From: "J. Bruce Fields" <bfields@fieldses.org>
> > > To: "Ulrich Gemkow" <ulrich.gemkow@ikr.uni-stuttgart.de>
> > > Cc: linux-nfs@vger.kernel.org
> > > Sent: Tuesday, August 25, 2015 11:54:56 PM
> > > Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of
> > > client
> >
> > > On Tue, Aug 25, 2015 at 07:28:03PM +0200, Ulrich Gemkow wrote:
> > >> Hello Bruce,
> > >>
> > >> On Monday 24 August 2015 22:14:01 J. Bruce Fields wrote:
> > >> > On Mon, Aug 24, 2015 at 02:52:55PM +0200, Ulrich Gemkow wrote:
> > >> > > we have a weired problem with Linux NFSv4.0 Server (Vanilla
> > >> > > Kernel 4.1.6) and a Sun Solaris 10 client (all patches applied):
> > >> > >
> > >> > > When mounting a share on the Solaris client and then rebooting
> > >> > > the client without unmounting the share first, after the reboot
> > >> > > every attempt to mount the share again gives an I/O error on
> > >> > > the client and the mount fails.
> > >> > >
> > >> > > After a long time (serveral hours) the v4 mount suddenly works
> > >> > > again.
> > >> > >
> > >> > > Mounting a share with vers=2 works always even in times when
> > >> > > the v4 mount fails.
> > >> > >
> > >> > > So it seems the Linux NFSv4 server holds a state for the client
> > >> > > which prevents the re-mounting of the share and gives the
> > >> > > I/O-error on the client.
> > >> > >
> > >> > > We use NFSv4 without idmapd.
> > >> > >
> > >> > > Is there any tip how to debug or solve this?
> > >> >
> > >> > Best is probably to get a packet trace.  So something like:
> > >> >
> > >> > 	tcpdump -s0 -iem0 -wtmp.pcap
> > >> >
> > >> > and then try the client mount, then kill the tcpdump after the
> > >> > mount fails, and send us tmp.pcap.  (And/or take a look at
> > >> > tmp.pcap yourself with wireshark.  The interesting question is
> > >> > what kind of error the server is returning when the client tries
> > >> > the mount after reboot.)
> > >>
> > >> Thank you for your reply. The tcpdump is attached, the relevant
> > >> packets are 49..52. The error seems to be a SERVERFAULT. Can you
> > >> see more from the dump?
> > >>
> > >> Thanks again and best regards
> > >
> > > The SERVERFAULT is on SETCLIENTID_CONFIRM.
> > >
> > > In nfsd4_setclientid_confirm():
> > >
> > >	conf = find_confirmed_client(clid, false, nn);
> > >	unconf = find_unconfirmed_client(clid, false, nn);
> > >	/*
> > >         * We try hard to give out unique clientid's, so if we get an
> > >         * attempt to confirm the same clientid with a different cred,
> > >         * there's a bug somewhere.  Let's charitably assume it's our
> > >         * bug.
> > >         */
> > >        status = nfserr_serverfault;
> > >        if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred))
> > >                goto out;
> > >        if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred))
> > >                goto out;

If the creds don't match, the return should be NFS4ERR_CLID_INUSE per
section 16.34.5. IMPLEMENTATION first bullet after DRC discussion.

At least the way I read RFC 7530...

> > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical
> > > auth_unix creds.
> > >
> > > The clientid that were looking up there was returned from the
> > > previous SETCLIENTID, generated by this logic:
> > >
> > >	if (conf && same_verf(&conf->cl_verifier, &clverifier))
> > >                /* case 1: probable callback update */
> > >                copy_clid(new, conf);
> > >        else /* case 4 (new client) or cases 2, 3 (client reboot): */
> > >                gen_clid(new, nn);
> > >
> > > So it should be a brand new clientid, unless the client was reusing
> > > the old verifier.
> > >
> > > So perhaps the client is sending the SETCLIENTID with a verifier set
> > > to what it used on the previous boot?  That sounds like a client
> > > bug.  The linux client uses a timestamp for the verifier, looks like
> > > the Solaris client might too.  Is there some reason the clock on
> > > this client isn't advancing on reboot?
> >
> > probably NFS4ERR_STALE_CLIENTID is a better error code for this
scenario.
> 
> SERVERFAULT is obviously lame, but I don't know that STALE_CLIENTID is
> right either.
> 
> Another thing that's weird is:
> 
> 	> After a long time (serveral hours) the v4 mount suddenly works
> 	> again.
> 
> I'd expect the clent to expire after a lease period (default 90 seconds),
I don't
> know what could be happening that would take hours.
> 
> Also I don't know why those creds would change after a reboot.
> 
> Anyway I think a trace covering the reboot is still the best hope of an
> explanation.

Frank


  reply	other threads:[~2015-08-27 20:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-24 12:52 NFSv4 mount fails on Sun Solaris 10 after reboot of client Ulrich Gemkow
2015-08-24 20:14 ` J. Bruce Fields
2015-08-25 17:28   ` Ulrich Gemkow
2015-08-25 21:54     ` J. Bruce Fields
2015-08-26 19:54       ` Ulrich Gemkow
2015-08-26 20:09         ` J. Bruce Fields
2015-08-31 12:08           ` Ulrich Gemkow
2015-08-31 14:51             ` J. Bruce Fields
2015-08-31 15:52               ` Mkrtchyan, Tigran
2015-08-27  6:43       ` Mkrtchyan, Tigran
2015-08-27 18:29         ` J. Bruce Fields
2015-08-27 20:36           ` Frank Filz [this message]
2015-08-28 18:06             ` 'J. Bruce Fields'
2015-09-01 17:43               ` 'J. Bruce Fields'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='008901d0e108$13caa520$3b5fef60$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tigran.mkrtchyan@desy.de \
    --cc=ulrich.gemkow@ikr.uni-stuttgart.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).