All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'J. Bruce Fields'" <bfields@fieldses.org>,
	"'Mkrtchyan, Tigran'" <tigran.mkrtchyan@desy.de>
Cc: "'Ulrich Gemkow'" <ulrich.gemkow@ikr.uni-stuttgart.de>,
	<linux-nfs@vger.kernel.org>
Subject: RE: NFSv4 mount fails on Sun Solaris 10 after reboot of client
Date: Thu, 27 Aug 2015 13:36:38 -0700	[thread overview]
Message-ID: <008901d0e108$13caa520$3b5fef60$@mindspring.com> (raw)
In-Reply-To: <20150827182922.GB11819@fieldses.org>

> On Thu, Aug 27, 2015 at 08:43:51AM +0200, Mkrtchyan, Tigran wrote:
> >
> >
> > ----- Original Message -----
> > > From: "J. Bruce Fields" <bfields@fieldses.org>
> > > To: "Ulrich Gemkow" <ulrich.gemkow@ikr.uni-stuttgart.de>
> > > Cc: linux-nfs@vger.kernel.org
> > > Sent: Tuesday, August 25, 2015 11:54:56 PM
> > > Subject: Re: NFSv4 mount fails on Sun Solaris 10 after reboot of
> > > client
> >
> > > On Tue, Aug 25, 2015 at 07:28:03PM +0200, Ulrich Gemkow wrote:
> > >> Hello Bruce,
> > >>
> > >> On Monday 24 August 2015 22:14:01 J. Bruce Fields wrote:
> > >> > On Mon, Aug 24, 2015 at 02:52:55PM +0200, Ulrich Gemkow wrote:
> > >> > > we have a weired problem with Linux NFSv4.0 Server (Vanilla
> > >> > > Kernel 4.1.6) and a Sun Solaris 10 client (all patches applied):
> > >> > >
> > >> > > When mounting a share on the Solaris client and then rebooting
> > >> > > the client without unmounting the share first, after the reboot
> > >> > > every attempt to mount the share again gives an I/O error on
> > >> > > the client and the mount fails.
> > >> > >
> > >> > > After a long time (serveral hours) the v4 mount suddenly works
> > >> > > again.
> > >> > >
> > >> > > Mounting a share with vers=2 works always even in times when
> > >> > > the v4 mount fails.
> > >> > >
> > >> > > So it seems the Linux NFSv4 server holds a state for the client
> > >> > > which prevents the re-mounting of the share and gives the
> > >> > > I/O-error on the client.
> > >> > >
> > >> > > We use NFSv4 without idmapd.
> > >> > >
> > >> > > Is there any tip how to debug or solve this?
> > >> >
> > >> > Best is probably to get a packet trace.  So something like:
> > >> >
> > >> > 	tcpdump -s0 -iem0 -wtmp.pcap
> > >> >
> > >> > and then try the client mount, then kill the tcpdump after the
> > >> > mount fails, and send us tmp.pcap.  (And/or take a look at
> > >> > tmp.pcap yourself with wireshark.  The interesting question is
> > >> > what kind of error the server is returning when the client tries
> > >> > the mount after reboot.)
> > >>
> > >> Thank you for your reply. The tcpdump is attached, the relevant
> > >> packets are 49..52. The error seems to be a SERVERFAULT. Can you
> > >> see more from the dump?
> > >>
> > >> Thanks again and best regards
> > >
> > > The SERVERFAULT is on SETCLIENTID_CONFIRM.
> > >
> > > In nfsd4_setclientid_confirm():
> > >
> > >	conf = find_confirmed_client(clid, false, nn);
> > >	unconf = find_unconfirmed_client(clid, false, nn);
> > >	/*
> > >         * We try hard to give out unique clientid's, so if we get an
> > >         * attempt to confirm the same clientid with a different cred,
> > >         * there's a bug somewhere.  Let's charitably assume it's our
> > >         * bug.
> > >         */
> > >        status = nfserr_serverfault;
> > >        if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred))
> > >                goto out;
> > >        if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred))
> > >                goto out;

If the creds don't match, the return should be NFS4ERR_CLID_INUSE per
section 16.34.5. IMPLEMENTATION first bullet after DRC discussion.

At least the way I read RFC 7530...

> > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical
> > > auth_unix creds.
> > >
> > > The clientid that were looking up there was returned from the
> > > previous SETCLIENTID, generated by this logic:
> > >
> > >	if (conf && same_verf(&conf->cl_verifier, &clverifier))
> > >                /* case 1: probable callback update */
> > >                copy_clid(new, conf);
> > >        else /* case 4 (new client) or cases 2, 3 (client reboot): */
> > >                gen_clid(new, nn);
> > >
> > > So it should be a brand new clientid, unless the client was reusing
> > > the old verifier.
> > >
> > > So perhaps the client is sending the SETCLIENTID with a verifier set
> > > to what it used on the previous boot?  That sounds like a client
> > > bug.  The linux client uses a timestamp for the verifier, looks like
> > > the Solaris client might too.  Is there some reason the clock on
> > > this client isn't advancing on reboot?
> >
> > probably NFS4ERR_STALE_CLIENTID is a better error code for this
scenario.
> 
> SERVERFAULT is obviously lame, but I don't know that STALE_CLIENTID is
> right either.
> 
> Another thing that's weird is:
> 
> 	> After a long time (serveral hours) the v4 mount suddenly works
> 	> again.
> 
> I'd expect the clent to expire after a lease period (default 90 seconds),
I don't
> know what could be happening that would take hours.
> 
> Also I don't know why those creds would change after a reboot.
> 
> Anyway I think a trace covering the reboot is still the best hope of an
> explanation.

Frank


  reply	other threads:[~2015-08-27 20:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-24 12:52 NFSv4 mount fails on Sun Solaris 10 after reboot of client Ulrich Gemkow
2015-08-24 20:14 ` J. Bruce Fields
2015-08-25 17:28   ` Ulrich Gemkow
2015-08-25 21:54     ` J. Bruce Fields
2015-08-26 19:54       ` Ulrich Gemkow
2015-08-26 20:09         ` J. Bruce Fields
2015-08-31 12:08           ` Ulrich Gemkow
2015-08-31 14:51             ` J. Bruce Fields
2015-08-31 15:52               ` Mkrtchyan, Tigran
2015-08-27  6:43       ` Mkrtchyan, Tigran
2015-08-27 18:29         ` J. Bruce Fields
2015-08-27 20:36           ` Frank Filz [this message]
2015-08-28 18:06             ` 'J. Bruce Fields'
2015-09-01 17:43               ` 'J. Bruce Fields'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='008901d0e108$13caa520$3b5fef60$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tigran.mkrtchyan@desy.de \
    --cc=ulrich.gemkow@ikr.uni-stuttgart.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.