From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: Trond Myklebust <trondmy@primarydata.com>,
List Linux NFS Mailing <linux-nfs@vger.kernel.org>,
Oleg Drokin <green@linuxhacker.ru>
Subject: Re: [PATCH v7 13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks
Date: Thu, 10 Nov 2016 15:54:46 -0500 [thread overview]
Message-ID: <50b6aeb9-cb21-9f46-dadd-e7ba0f5d86ed@Netapp.com> (raw)
In-Reply-To: <BDCCD810-781F-4DD6-91E8-279A2C3377EF@redhat.com>
On 11/10/2016 03:18 PM, Benjamin Coddington wrote:
>
> On 10 Nov 2016, at 10:58, Benjamin Coddington wrote:
>
>> Hi Anna,
>>
>> On 10 Nov 2016, at 10:01, Anna Schumaker wrote:
>>> Do you have an estimate for when this patch will be ready? I want to include it in my next bugfix pull request for 4.9.
>>
>> I haven't posted because I am still trying to get to the bottom of another
>> problem where the client gets stuck in a loop sending the same stateid over
>> and over on NFS4ERR_OLD_STATEID. I want to make sure this problem isn't
>> caused by this fix -- which I don't think it is, but I'd rather make sure.
>> If I don't make any progress on this problem by the end of today, I'll post
>> what I have.
>>
>> Read on if interested in this new problem:
>>
>> It looks like racing opens with the same openowner can be returned out of
>> order by the server, so the client sees stateid seqid of 2 before 1. Then a
>> LOCK sent with seqid 1 is endlessly retried if sent while doing recovery.
>>
>> It's hard to tell if I was able to capture all the moving parts to describe
>> this problem, though. As it takes a very long time for me to reproduce, and
>> the packet captures were dropping frames. I'm working on manually
>> reproducing it now.
>
> Anna,
>
> I haven't gotten to the bottom of it, and so I'm not confident it isn't a
> problem created by the fix I've been testing, which is:
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index e809498..2aa9d86 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2564,12 +2564,15 @@ static void nfs41_check_delegation_stateid(struct
> nfs4_state *state)
> static int nfs41_check_expired_locks(struct nfs4_state *state)
> {
> int status, ret = NFS_OK;
> - struct nfs4_lock_state *lsp;
> + struct nfs4_lock_state *lsp, *tmp;
> struct nfs_server *server = NFS_SERVER(state->inode);
>
> if (!test_bit(LK_STATE_IN_USE, &state->flags))
> goto out;
> - list_for_each_entry(lsp, &state->lock_states, ls_locks) {
> + spin_lock(&state->state_lock);
> + list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) {
> + atomic_inc(&lsp->ls_count);
> + spin_unlock(&state->state_lock);
> if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
> struct rpc_cred *cred =
> lsp->ls_state->owner->so_cred;
>
> @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct
> nfs4_state *state)
> break;
> }
> }
> - };
> + nfs4_put_lock_state(lsp);
> + spin_lock(&state->state_lock);
> + }
> + spin_unlock(&state->state_lock);
> out:
> return ret;
> }
>
> http://people.redhat.com/bcodding/old_stateid_loop is tshark output of my
> only good wirecapture of the problem. Without this patch, generic/089
> crashes long before this problem is reproduced, so I am stuck figuring it
> out, I'm afraid. Don't wait on my account.
>
> I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll
> write about it under separate cover.
Sounds good. Thanks for the update!
Anna
>
> Ben
next prev parent reply other threads:[~2016-11-10 20:54 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-22 17:38 [PATCH v7 00/31] Fix delegation behaviour when server revokes some state Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 01/31] NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flags Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 02/31] NFS: Fix inode corruption in nfs_prime_dcache() Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 03/31] NFSv4: Don't report revoked delegations as valid in nfs_have_delegation() Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 04/31] NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 05/31] NFSv4.1: Don't check delegations that are already marked as revoked Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 06/31] NFSv4.1: Allow test_stateid to handle session errors without waiting Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 07/31] NFSv4.1: Add a helper function to deal with expired stateids Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 08/31] NFSv4.x: Allow callers of nfs_remove_bad_delegation() to specify a stateid Trond Myklebust
2016-09-22 17:38 ` [PATCH v7 09/31] NFSv4.1: Test delegation stateids when server declares "some state revoked" Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 10/31] NFSv4.1: Deal with server reboots during delegation expiration recovery Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 11/31] NFSv4.1: Don't recheck delegations that have already been checked Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 12/31] NFSv4.1: Allow revoked stateids to skip the call to TEST_STATEID Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 14/31] NFSv4.1: FREE_STATEID can be asynchronous Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 15/31] NFSv4.1: Ensure we call FREE_STATEID if needed on close/delegreturn/locku Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 16/31] NFSv4: Ensure we don't re-test revoked and freed stateids Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 17/31] NFSv4: nfs_inode_find_state_and_recover() should check all stateids Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 18/31] NFSv4: nfs4_handle_delegation_recall_error() handle expiration as revoke case Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 19/31] NFSv4: nfs4_handle_setlk_error() " Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 20/31] NFSv4.1: nfs4_layoutget_handle_exception handle revoked state Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 21/31] NFSv4: Pass the stateid to the exception handler in nfs4_read/write_done_cb Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 22/31] NFSv4: Fix a race in nfs_inode_reclaim_delegation() Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 23/31] NFSv4: Fix a race when updating an open_stateid Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 24/31] NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 25/31] NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 26/31] NFSv4: Don't test open_stateid unless it is set Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 27/31] NFSv4: Mark the lock and open stateids as invalid after freeing them Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 28/31] NFSv4: Open state recovery must account for file permission changes Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 29/31] NFSv4: Fix retry issues with nfs41_test/free_stateid Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 30/31] NFSv4: If recovery failed for a specific open stateid, then don't retry Trond Myklebust
2016-09-22 17:39 ` [PATCH v7 31/31] NFSv4.1: Even if the stateid is OK, we may need to recover the open modes Trond Myklebust
2016-10-14 12:50 ` [PATCH v7 23/31] NFSv4: Fix a race when updating an open_stateid Christoph Hellwig
2016-11-04 16:02 ` [PATCH v7 13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks Benjamin Coddington
2016-11-07 13:09 ` Benjamin Coddington
2016-11-07 13:45 ` Benjamin Coddington
2016-11-07 14:50 ` Benjamin Coddington
2016-11-07 14:59 ` Trond Myklebust
2016-11-08 15:10 ` Benjamin Coddington
2016-11-08 15:20 ` Trond Myklebust
2016-11-10 15:01 ` Anna Schumaker
2016-11-10 15:58 ` Benjamin Coddington
2016-11-10 16:51 ` Trond Myklebust
2016-11-10 20:18 ` Benjamin Coddington
2016-11-10 20:54 ` Anna Schumaker [this message]
2016-09-24 20:38 ` [PATCH v7 00/31] Fix delegation behaviour when server revokes some state Oleg Drokin
2016-09-26 20:23 ` Oleg Drokin
[not found] ` <A84EB639-97C3-4517-A92F-3A4176A7F916@primarydata.com>
2016-09-26 21:03 ` Oleg Drokin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50b6aeb9-cb21-9f46-dadd-e7ba0f5d86ed@Netapp.com \
--to=anna.schumaker@netapp.com \
--cc=bcodding@redhat.com \
--cc=green@linuxhacker.ru \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).