All of lore.kernel.org
 help / color / mirror / Atom feed
From: Scott Mayhew <smayhew@redhat.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Question about open(CLAIM_FH)
Date: Thu, 18 Apr 2019 16:43:56 -0400	[thread overview]
Message-ID: <20190418204356.GA15226@coeurl.usersys.redhat.com> (raw)
In-Reply-To: <213d4ead8a7ae890dadc7785d59117e798f94748.camel@hammerspace.com>

On Thu, 18 Apr 2019, Trond Myklebust wrote:

> Hi Scott,
> 
> On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > When the client does an open(CLAIM_FH) and the server already has
> > open
> > state for that open owner and file, what's supposed to happen?
> > Currently the server returns the existing stateid with the seqid
> > bumped,
> > but it looks like the client is expecting a new stateid (I'm seeing
> > the
> > state manager spending a lot of time waiting in
> > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being set
> > in
> > the state flags by nfs_need_update_open_stateid()).
> > 
> > Looking at rfc5661 section 18.16.3, I see:
> > 
> >    | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> > request |
> >    |                      | and there is no previous state
> > associated  |
> >    |                      | with the file for the
> > client.  With        |
> >    |                      | CLAIM_NULL, the file is identified by
> > the  |
> >    |                      | current filehandle and the
> > specified       |
> >    |                      | component name.  With CLAIM_FH (new
> > to     |
> >    |                      | NFSv4.1), the file is identified by
> > just   |
> >    |                      | the current filehandle.  
> > 
> > So it seems like maybe the server should be tossing the old state and
> > returning a new stateid?
> > 
> 
> No. As far as the protocol is concerned, the only difference between
> CLAIM_NULL and CLAIM_FH is through how the client identifies the file
> (in the first case, through an implicit lookup, and in the second case
> through a file handle). The client should be free to intermix the two
> types of OPEN, and it should expect the resulting stateids to depend
> only on whether or not the open_owner matches. If the open_owner
> matches an existing stateid, then that stateid is bumped and returned.
> 
> I'm not aware of any expectation in the client that this should not be
> the case, so if you are seeing different behaviour, then something else
> must be at work here. Is the client perhaps mounting the same
> filesystem in two different places in such a way that the super block
> is not being shared?

No, it's just a single 4.1 mount w/ the default mount options.

For a bit of background, I've been trying to track down a problem in
RHEL where the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flags is getting
permanently set because the nfs4_client->cl_revoked list on the server
is non-empty... yet there's no longer open state on the client. 

I can reproduce it pretty easily in RHEL using 2 VMs, each with 2-4 CPUs
and 4-8G of memory.  The server has 64 nfsd threads and a 15 second
lease time.

On the client I'm running the following to add a 10ms delay to CB_RECALL
replies:
# stap -gve 'global count = 0; probe module("nfsv4").function("nfs4_callback_recall") { printf("%s: %d\n", ppfunc(), ++count); mdelay(10); }'

then in another window I open a bunch of files:
# for i in `seq -w 1 5000`; do sleep 2m </mnt/t/dir1/file.$i & done

(Note: I already created the files ahead of time)

As soon as the bash prompt returns on the client, I run the following on
the server:
# for i in `seq -w 1 5000`; do date >/export/dir1/file.$i & done

At that point, any further SEQUENCE ops will have the recallable state
revoked flag set on the client until the fs is unmounted.

If I run the same steps on Fedora clients with recent kernels, I don't
have the problem with the recallable state revoked flag, but I'm getting
some other strangeness.  Everything starts out fine with
nfs_reap_expired_delegations() doing TEST_STATEID and FREE_STATEID, but
once the state manager starts callings nfs41_open_expired(), things sort
of grind to a halt and I see 1 OPEN and 1 or 2 TEST_STATEID ops every 5
seconds in wireshark.  It stays that way until the files are closed on
the client, when I see a slew of DELEGRETURNs and FREE_STATEIDs... but
I'm only seeing 3 or 4 CLOSE ops.  If I poke around in crash on the
server, I see a ton of open stateids:

crash> epython fs/nfsd/print-client-state-info.py
nfsd_net = 0xffff93e473511000
        nfs4_client = 0xffff93e3f7954980
                nfs4_stateowner = 0xffff93e4058cc360 num_stateids = 4997 <---- only 3 CLOSE ops were received
                num_openowners = 1
                num_layouts = 0
                num_delegations = 0
                num_sessions = 1
                num_copies = 0
                num_revoked = 0
                cl_cb_waitq_qlen = 0

Those stateids stick around until the fs is unmounted (and the
DESTROY_STATEID ops return NFS4ERR_CLIENTID_BUSY while doing so).

Both VMs are running 5.0.6-200.fc29.x86_64, but the server also has the
"nfsd: Don't release the callback slot unless it was actually held"
patch you sent a few weeks ago as well as the "nfsd: CB_RECALL can race
with FREE_STATEID" patch I sent today.

-Scott

> 
> Cheers
>   Trond
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 

  parent reply	other threads:[~2019-04-18 20:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-18 13:37 Question about open(CLAIM_FH) Scott Mayhew
2019-04-18 14:38 ` Trond Myklebust
2019-04-18 15:26   ` Trond Myklebust
2019-04-18 20:43   ` Scott Mayhew [this message]
2019-04-18 21:31     ` Trond Myklebust
2019-04-30 18:44       ` Scott Mayhew
2019-04-30 18:56         ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190418204356.GA15226@coeurl.usersys.redhat.com \
    --to=smayhew@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.