From: Aaron Pace <Aaron.Pace@alcatel-lucent.com>
To: <linux-nfs@vger.kernel.org>
Subject: Type mismatch causing stale client loop
Date: Tue, 20 Jan 2015 01:01:47 -0700 [thread overview]
Message-ID: <54BE0B6B.5090609@alcatel-lucent.com> (raw)
Hello,
I didn't see this issue reported already, but then, I didn't do a
terribly exhaustive search, so my apologies if this is already known.
I noticed that I was getting looping stale client errors while trying to
mount an NFS share (example below):
[ 965.926293] nfsd_dispatch: vers 4 proc 1
[ 965.973373] nfsv4 compound op #1/1: 35 (OP_SETCLIENTID)
[ 966.036158] renewing client (clientid 6f1df70d/00002580)
[ 966.099880] nfsv4 compound op ffff880450d51080 opcnt 1 #1: 35: status 0
[ 966.179190] nfsv4 compound returned 0
[ 966.223447] nfsd_dispatch: vers 4 proc 1
[ 966.270475] nfsv4 compound op #1/1: 36 (OP_SETCLIENTID_CONFIRM)
[ 966.341487] NFSD stale clientid (6f1df70d/00002580) boot_time 16f1df70d
[ 966.420791] nfsv4 compound op ffff880450d51080 opcnt 1 #1: 36: status
10022
[ 966.504419] nfsv4 compound returned 10022
[ 966.552738] nfsd_dispatch: vers 4 proc 1
The 'stale' error comes from nfs4state.c:
static int
STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
{
if (clid->cl_boot == nn->boot_time)
return 0;
dprintk("NFSD stale clientid (%08x/%08x) boot_time %08lx\n",
clid->cl_boot, clid->cl_id, nn->boot_time);
return 1;
}
I thought to myself -- 'Self, it seems statistically unlikely that a
legitimately mismatching cl_boot and nn->boot_time would have identical
lower 32-bits'.
As it turns out, nn->boot time is defined as time_t (unsigned long / 64
bits on a 64 bit platform), and cl_boot is defined as a u32.
My system time, as you may have guessed, was wildly invalid (2025-ish).
However, this does appear to be a legitimate issue in a 64-bit kernel
that will crop up in a few years. I was working in 3.10, but I verified
that the definitions are identical in the current 3.19 release candidate.
Sadly, I don't have the bandwidth (or the expertise) to really
understand the ramifications of what seems to be the logical next step,
changing cl_boot to be time_t instead of u32. I am hoping that this
will be trivial to look at for someone on this list.
Thanks,
-Aaron Pace
next reply other threads:[~2015-01-20 8:13 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-20 8:01 Aaron Pace [this message]
2015-01-20 15:18 ` Type mismatch causing stale client loop J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54BE0B6B.5090609@alcatel-lucent.com \
--to=aaron.pace@alcatel-lucent.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox