* nfs4_state_manager() vs. nfs_server_remove_lists()
@ 2014-07-29 18:39 Steve Dickson
2014-07-29 19:52 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Steve Dickson @ 2014-07-29 18:39 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Linux NFS Mailing list, Andy Adamson
Hello,
I've been seeing a panic where nfs4_state_manager()
ends up processing an v3 nfs client pointer.
The panic happens at the top of nfs4_state_manager()
because clp->cl_mvops == NULL;
Looking at the pointer (via crash) it becomes obvious
it is a V3 client point (AKA rpc_ops = nfs_v3_clientop)
Now the reason we are in the state manager code is a NFSv4
mount doing server discovery so it is waking the client list
in nfs41_walk_client_list()
Now looking at the at the entire stack with crash, the
only time that v3 client pointer appears is after
nfs41_walk_client_list() has been called so I'm 99%
sure the pointer is coming from the cl_share_link list.
So the question is how is that v3 client pointer on that
list, in non NFS_CS_READY state.
Well, simultaneously a V3 mount is happening. In nfs_fs_mount_common()
it notices there is already a existing supper block sit decides to
free its server pointer so nfs_server_remove_lists() is called.
What nfs_server_remove_lists() and nfs41_walk_client_list()
have in common is the nfs_client_lock spin lock.
Also the client pointer in the server pointer being freed is
in a non NFS_CS_READY state
To answer the question, the v3 client pointer, in a non
NFS_CS_READY state, is found by nfs41_walk_client_list()
because it beat nfs_server_remove_lists() to the
nfs_client_lock spin lock.
nfs41_walk_client_list() finds the uninitialized client
pointer nfs_server_remove_lists() is trying to free and
processes it and then fall over...
Note this was very hard to reproduce since a very large client
(many cores) is needed and a very fast server and a few
hours...
Question, since both v3 and v4 clients are on the cl_share_link
list should there be a check in nfs41_walk_client_list() to
process only v4 clients?
steved.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
2014-07-29 18:39 nfs4_state_manager() vs. nfs_server_remove_lists() Steve Dickson
@ 2014-07-29 19:52 ` Trond Myklebust
2014-07-29 20:40 ` Steve Dickson
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2014-07-29 19:52 UTC (permalink / raw)
To: Steve Dickson; +Cc: Linux NFS Mailing list, Andy Adamson
On Tue, Jul 29, 2014 at 2:39 PM, Steve Dickson <SteveD@redhat.com> wrote:
> Hello,
>
> I've been seeing a panic where nfs4_state_manager()
> ends up processing an v3 nfs client pointer.
>
> The panic happens at the top of nfs4_state_manager()
> because clp->cl_mvops == NULL;
>
> Looking at the pointer (via crash) it becomes obvious
> it is a V3 client point (AKA rpc_ops = nfs_v3_clientop)
>
> Now the reason we are in the state manager code is a NFSv4
> mount doing server discovery so it is waking the client list
> in nfs41_walk_client_list()
>
> Now looking at the at the entire stack with crash, the
> only time that v3 client pointer appears is after
> nfs41_walk_client_list() has been called so I'm 99%
> sure the pointer is coming from the cl_share_link list.
>
> So the question is how is that v3 client pointer on that
> list, in non NFS_CS_READY state.
>
> Well, simultaneously a V3 mount is happening. In nfs_fs_mount_common()
> it notices there is already a existing supper block sit decides to
> free its server pointer so nfs_server_remove_lists() is called.
>
> What nfs_server_remove_lists() and nfs41_walk_client_list()
> have in common is the nfs_client_lock spin lock.
>
> Also the client pointer in the server pointer being freed is
> in a non NFS_CS_READY state
>
> To answer the question, the v3 client pointer, in a non
> NFS_CS_READY state, is found by nfs41_walk_client_list()
> because it beat nfs_server_remove_lists() to the
> nfs_client_lock spin lock.
>
> nfs41_walk_client_list() finds the uninitialized client
> pointer nfs_server_remove_lists() is trying to free and
> processes it and then fall over...
>
> Note this was very hard to reproduce since a very large client
> (many cores) is needed and a very fast server and a few
> hours...
>
> Question, since both v3 and v4 clients are on the cl_share_link
> list should there be a check in nfs41_walk_client_list() to
> process only v4 clients?
>
Hi Steve,
Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
"pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
new->cl_proto" so that they all happen before we try to test the value
of cl_cons_state.
As far as I can tell, all those values are guaranteed to be set as
part of the struct nfs_client allocators, before we ever put the
result on the cl_share_link list.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
2014-07-29 19:52 ` Trond Myklebust
@ 2014-07-29 20:40 ` Steve Dickson
2014-07-29 21:58 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Steve Dickson @ 2014-07-29 20:40 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Linux NFS Mailing list, Andy Adamson
On 29/07/14 15:52, Trond Myklebust wrote:
> Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
> "pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
> new->cl_proto" so that they all happen before we try to test the value
> of cl_cons_state.
> As far as I can tell, all those values are guaranteed to be set as
> part of the struct nfs_client allocators, before we ever put the
> result on the cl_share_link list.
The check for
if (pos->cl_cons_state > NFS_CS_READY)
then right after that check is:
if (pos->cl_cons_state != NFS_CS_READY)
continue;
confuses me... Is the second check even needed?
steved.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
2014-07-29 20:40 ` Steve Dickson
@ 2014-07-29 21:58 ` Trond Myklebust
[not found] ` <CADnza45-ytG-0GcfS3Q0SczekP9M+F9z5EJA0dMsbhD3V=d2Gg@mail.gmail.com>
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2014-07-29 21:58 UTC (permalink / raw)
To: Steve Dickson; +Cc: Linux NFS Mailing list, Andy Adamson
On Tue, Jul 29, 2014 at 4:40 PM, Steve Dickson <SteveD@redhat.com> wrote:
> On 29/07/14 15:52, Trond Myklebust wrote:
>> Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
>> "pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
>> new->cl_proto" so that they all happen before we try to test the value
>> of cl_cons_state.
>> As far as I can tell, all those values are guaranteed to be set as
>> part of the struct nfs_client allocators, before we ever put the
>> result on the cl_share_link list.
>
> The check for
> if (pos->cl_cons_state > NFS_CS_READY)
>
> then right after that check is:
>
> if (pos->cl_cons_state != NFS_CS_READY)
> continue;
>
> confuses me... Is the second check even needed?
>
> steved.
Yes. The result of the lease_recovery could be that the nfs_client is
left in a state of error if, say, we get a NFS4ERR_CLID_INUSE beastie.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
[not found] ` <CADnza45-ytG-0GcfS3Q0SczekP9M+F9z5EJA0dMsbhD3V=d2Gg@mail.gmail.com>
@ 2014-09-16 19:28 ` Trond Myklebust
2014-09-17 12:59 ` Steve Dickson
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2014-09-16 19:28 UTC (permalink / raw)
To: Fred Isaman; +Cc: Steve Dickson, Linux NFS Mailing list, Andy Adamson
On Tue, Sep 16, 2014 at 2:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
> Was a patch ever submitted for this? I'm seeing something similar but can't
> find the fix upstream.
As far as I can tell, the answer is no. If you are seeing the bug,
then could you please post the discussed fix?
Cheers
TRond
>
> Fred
>
> On Tue, Jul 29, 2014 at 5:58 PM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>>
>> On Tue, Jul 29, 2014 at 4:40 PM, Steve Dickson <SteveD@redhat.com> wrote:
>> > On 29/07/14 15:52, Trond Myklebust wrote:
>> >> Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
>> >> "pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
>> >> new->cl_proto" so that they all happen before we try to test the value
>> >> of cl_cons_state.
>> >> As far as I can tell, all those values are guaranteed to be set as
>> >> part of the struct nfs_client allocators, before we ever put the
>> >> result on the cl_share_link list.
>> >
>> > The check for
>> > if (pos->cl_cons_state > NFS_CS_READY)
>> >
>> > then right after that check is:
>> >
>> > if (pos->cl_cons_state != NFS_CS_READY)
>> > continue;
>> >
>> > confuses me... Is the second check even needed?
>> >
>> > steved.
>>
>> Yes. The result of the lease_recovery could be that the nfs_client is
>> left in a state of error if, say, we get a NFS4ERR_CLID_INUSE beastie.
>>
>> Cheers
>> Trond
>>
>> --
>> Trond Myklebust
>>
>> Linux NFS client maintainer, PrimaryData
>>
>> trond.myklebust@primarydata.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
2014-09-16 19:28 ` Trond Myklebust
@ 2014-09-17 12:59 ` Steve Dickson
0 siblings, 0 replies; 6+ messages in thread
From: Steve Dickson @ 2014-09-17 12:59 UTC (permalink / raw)
To: Trond Myklebust, Fred Isaman; +Cc: Linux NFS Mailing list, Andy Adamson
Hello,
On 09/16/2014 03:28 PM, Trond Myklebust wrote:
> On Tue, Sep 16, 2014 at 2:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
>> > Was a patch ever submitted for this? I'm seeing something similar but can't
>> > find the fix upstream.
> As far as I can tell, the answer is no. If you are seeing the bug,
> then could you please post the discussed fix?
It looks like I dropped the ball... I too am see this problem
and can not find the posted patch... Working on it....
steved.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-09-17 12:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-29 18:39 nfs4_state_manager() vs. nfs_server_remove_lists() Steve Dickson
2014-07-29 19:52 ` Trond Myklebust
2014-07-29 20:40 ` Steve Dickson
2014-07-29 21:58 ` Trond Myklebust
[not found] ` <CADnza45-ytG-0GcfS3Q0SczekP9M+F9z5EJA0dMsbhD3V=d2Gg@mail.gmail.com>
2014-09-16 19:28 ` Trond Myklebust
2014-09-17 12:59 ` Steve Dickson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox