From: Jeff Layton <jlayton@poochiereds.net>
To: Nick Bowler <nbowler@draconx.ca>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, linux-nfs@vger.kernel.org
Subject: Re: PROBLEM: nfs I/O errors with sqlite applications
Date: Fri, 29 Jul 2016 13:52:15 -0400 [thread overview]
Message-ID: <1469814735.19411.1.camel@poochiereds.net> (raw)
In-Reply-To: <CADyTPEx=h95ODeG3BixMHc=kxLmkFt+aVyS+V_bK-b=CqK4_6Q@mail.gmail.com>
On Fri, 2016-07-29 at 12:43 -0400, Nick Bowler wrote:
> Hi guys,
>
> > On 2015-10-13, Nick Bowler <nbowler@draconx.ca> wrote:
> >
> > > > On 2015-10-13, Jeff Layton <jlayton@poochiereds.net> wrote:
> > >
> > > On Mon, 12 Oct 2015 23:01:36 -0400
> > > > > > Nick Bowler <nbowler@draconx.ca> wrote:
> > > >
> > > > On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
> > > > >
> > > > > On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
> > > > > >
> > > > > > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> [...]
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > the failing syscall seems to be:
> > > > > > >
> > > > > > > fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
> > > > > > > start=1073741824, len=1}) = -1 EIO (Input/output error)
> > > > > > >
> > > > > > > When the issue occurs, the client dmesg log is full of messages of
> > > > > > > the form:
> > > > > > >
> > > > > > > [3441972.381211] NFS: v4 server returned a bad sequence-id error
> > > > > > > on an unconfirmed sequence ffff88007612ae20!
> > > > > > >
> > > > > > > There are no unusual messages on the server.
> > > > [...]
> > > Ok, makes sense. The log shows that it occurred in a fcntl call, so
> > > it's probably this from lookup_or_create_lock_state:
> > >
> > > lo = find_lockowner_str(cl, &lock->lk_new_owner);
> > > if (!lo) {
> > > strhashval = ownerstr_hashval(&lock->lk_new_owner);
> > > lo = alloc_init_lock_stateowner(strhashval, cl, ost,
> > > lock);
> > > if (lo == NULL)
> > > return nfserr_jukebox;
> > > } else {
> > > /* with an existing lockowner, seqids must be the same */
> > > status = nfserr_bad_seqid;
> > > if (!cstate->minorversion &&
> > > lock->lk_new_lock_seqid != lo->lo_owner.so_seqid)
> > > goto out;
> > > }
> > >
> > > ...so we found an existing lockowner, but the seqid in the call is
> > > wrong. It seems like the client ought to try to recover in this case,
> > > but I don't see where it handles BAD_SEQID errors in the locking code.
> [...]
> >
> > >
> > > In any case, the question now is whether this is a client or server
> > > bug. What would tell us that is a network capture of the NFS traffic
> > > between client and server at the time that this occurs. Would it be
> > > possible to collect one? If so, then let Bruce and I know and we can
> > > figure out a way to share it privately.
>
> Hi guys,
>
> Unfortunately I did not manage to perform a network capture last time
> due to power loss. I did not hit this issue again until yesterday (~9
> months later), this time after 45 days of uptime.
>
> Kernel versions now are: 4.5.1 on the server, and 4.4.3 on the client.
>
> Since it's now in a failing state again (this situation persists until
> a reboot of the client), I captured with strace and tcpdump (on both
> client and server) when attempting to start gmpc, the result is quite
> small (just 30 packets). Will that be helpful?
>
> Thanks,
> Nick
I doubt we'd be able to tell much after the fact, but feel free to send it along.
Thanks,
--
Jeff Layton <jlayton@poochiereds.net>
next prev parent reply other threads:[~2016-07-29 17:52 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-12 16:48 PROBLEM: nfs I/O errors with sqlite applications Nick Bowler
2015-10-12 19:25 ` J. Bruce Fields
2015-10-12 19:46 ` J. Bruce Fields
2015-10-13 3:01 ` Nick Bowler
2015-10-13 10:52 ` Jeff Layton
2015-10-13 12:54 ` Nick Bowler
2016-07-29 16:43 ` Nick Bowler
2016-07-29 17:52 ` Jeff Layton [this message]
2017-06-06 16:46 ` Lutz Vieweg
2017-06-07 3:08 ` NeilBrown
2017-06-08 18:36 ` Lutz Vieweg
2017-06-08 22:07 ` NeilBrown
2017-06-09 11:01 ` Lutz Vieweg
2017-06-09 22:01 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1469814735.19411.1.camel@poochiereds.net \
--to=jlayton@poochiereds.net \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=nbowler@draconx.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).