linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bruce Fields <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trondmy@primarydata.com>,
	Bruce Fields <bfields@redhat.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: NFSv4.1 regression with v4.15-rc
Date: Fri, 12 Jan 2018 17:07:49 -0500	[thread overview]
Message-ID: <20180112220749.GA18468@fieldses.org> (raw)
In-Reply-To: <ACF7A175-BFE5-4D3D-9EA7-F76DAD004ECA@oracle.com>

On Tue, Jan 09, 2018 at 03:28:23PM -0500, Chuck Lever wrote:
> 
> 
> > On Jan 9, 2018, at 3:26 PM, Trond Myklebust <trondmy@primarydata.com> wrote:
> > 
> > On Sun, 2017-12-31 at 13:35 -0500, Chuck Lever wrote:
> >>> On Dec 30, 2017, at 1:14 PM, Chuck Lever <chuck.lever@oracle.com>
> >>> wrote:
> >>> 
> >>>> 
> >>>> On Dec 30, 2017, at 1:05 PM, Bruce Fields <bfields@fieldses.org>
> >>>> wrote:
> >>>> 
> >>>> On Wed, Dec 27, 2017 at 03:40:58PM -0500, Chuck Lever wrote:
> >>>>> Last week I updated my test server from v4.14 to v4.15-rc4, and
> >>>>> began to
> >>>>> observe intermittent failures in the git regression suite on
> >>>>> NFSv4.1.
> >>>> 
> >>>> I haven't run that before.  Should I just
> >>>> 
> >>>> 	mount -overs=4.1 server:/fs /mnt/
> >>>> 	cd /mnt/
> >>>> 	git clone git://git.kernel.org/pub/scm/git/git.git
> >>>> 	cd git
> >>>> 	make test
> >>>> 
> >>>> ?
> >>> 
> >>> You'll need to install SVN and CVS on your client as well.
> >>> The failures seem to occur only in the SVN/CVS related
> >>> tests.
> >>> 
> >>> 
> >>>>> I
> >>>>> was able to reproduce these failures with NFSv4.1 on both TCP
> >>>>> and RDMA,
> >>>>> yet there has not been a reproduction with NFSv3 or NFSv4.0.
> >>>>> 
> >>>>> The server hardware is a single-socket 4-core system with 32GB
> >>>>> of RAM.
> >>>>> The export is a tmpfs. Networking is 56Gb InfiniBand (or
> >>>>> IPoIB).
> >>>>> 
> >>>>> The git regression suite reports individual test failures in
> >>>>> the SVN
> >>>>> and CVS tests. On occasion, the client mount point freezes,
> >>>>> requiring
> >>>>> that the client be rebooted in order to unstick the mount.
> >>>>> 
> >>>>> Just before Christmas, I bisected the problem to:
> >>>> 
> >>>> Thanks for the report!  I'll make some time for this next
> >>>> week.  What's
> >>>> your client?
> >> 
> >> Oops, I didn't answer this question. The client is v4.15-rc4.
> >> 
> >> 
> >>>> I guess one start might be to see if the reproducer can be
> >>>> simplified e.g. by running just one of the tests from the suite.
> >>> 
> >>> The failures are intermittent, and occur in a different test
> >>> each time. You have to wait for the 9000-series scripts, which
> >>> test SVN/CVS repo operations. To speed up time-to-failure, use
> >>> "make -jN test" where N is more than a few.
> >>> 
> >>> My client and server both have multiple real cores. I'm
> >>> thinking it's the server that matters here (possibly a race
> >>> condition is introduced by the below commit?).
> >>> 
> >>> 
> >>>> --b.
> >>>> 
> >>>>> 
> >>>>> commit 659aefb68eca28ba9aa482a9fc64de107332e256
> >>>>> Author: Trond Myklebust <trond.myklebust@primarydata.com>
> >>>>> Date:   Fri Nov 3 08:00:13 2017 -0400
> >>>>> 
> >>>>>  nfsd: Ensure we don't recognise lock stateids after freeing
> >>>>> them
> >>>>> 
> >>>>>  In order to deal with lookup races, nfsd4_free_lock_stateid()
> >>>>> needs
> >>>>>  to be able to signal to other stateful functions that the
> >>>>> lock stateid
> >>>>>  is no longer valid. Right now, nfsd_lock() will check whether
> >>>>> or not an
> >>>>>  existing stateid is still hashed, but only in the "new lock"
> >>>>> path.
> >>>>> 
> >>>>>  To ensure the stateid invalidation is also recognised by the
> >>>>> "existing lock"
> >>>>>  path, and also by a second call to nfsd4_free_lock_stateid()
> >>>>> itself, we can
> >>>>>  change the type to NFS4_CLOSED_STID under the stp->st_mutex.
> >>>>> 
> >>>>>  Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.c
> >>>>> om>
> >>>>>  Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> >>>>> 
> >>>>> 
> > 
> > So, I'm thinking that release_open_stateid_locks() and
> > nfsd4_release_lockowner() should probably be setting NFS4_CLOSED_STID
> > when they call unhash_lock_stateid() (sorry for missing that).
> 
> Send me a patch and I can test it.

I'm pretty confused by that patch.  Among other things, it's setting
sc_type to NFS4_CLOSED_STID immediately before calling
release_lock_stateid() which itself will set sc_type to 0.

--b.

      reply	other threads:[~2018-01-12 22:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-27 20:40 NFSv4.1 regression with v4.15-rc Chuck Lever
2017-12-30 18:05 ` Bruce Fields
2017-12-30 18:14   ` Chuck Lever
2017-12-31 18:35     ` Chuck Lever
2018-01-09 20:26       ` Trond Myklebust
2018-01-09 20:28         ` Chuck Lever
2018-01-12 22:07           ` Bruce Fields [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180112220749.GA18468@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).