public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* nfs4_state_manager() vs. nfs_server_remove_lists()
@ 2014-07-29 18:39 Steve Dickson
  2014-07-29 19:52 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Dickson @ 2014-07-29 18:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing list, Andy Adamson

Hello,

I've been seeing a panic where nfs4_state_manager() 
ends up processing an v3 nfs client pointer.

The panic happens at the top of nfs4_state_manager()
because clp->cl_mvops == NULL;

Looking at the pointer (via crash) it becomes obvious
it is  a V3 client point (AKA rpc_ops = nfs_v3_clientop) 

Now the reason we are in the state manager code is a NFSv4 
mount doing server discovery so it is waking the client list
in nfs41_walk_client_list()

Now looking at the at the entire stack with crash, the 
only time that v3 client pointer appears is after 
nfs41_walk_client_list() has been called so I'm 99% 
sure the pointer is coming from the cl_share_link list.

So the question is how is that v3 client pointer on that
list, in non NFS_CS_READY state.

Well, simultaneously a V3 mount is happening. In nfs_fs_mount_common()
it notices there is already a existing supper block sit decides to 
free its server pointer so nfs_server_remove_lists() is called. 

What  nfs_server_remove_lists() and nfs41_walk_client_list()
have in common is the nfs_client_lock spin lock.

Also the client pointer in the server pointer being freed is
in a non NFS_CS_READY state

To answer the question, the v3 client pointer, in a non
NFS_CS_READY state, is found by nfs41_walk_client_list()
because it beat nfs_server_remove_lists() to the 
nfs_client_lock spin lock. 

nfs41_walk_client_list() finds the uninitialized client 
pointer nfs_server_remove_lists() is trying to free and
processes it and then fall over...

Note this was very hard to reproduce since a very large client 
(many cores) is needed and a very fast server and a few
hours... 

Question, since both v3 and v4 clients are on the cl_share_link 
list should there be a check in nfs41_walk_client_list() to 
process only v4 clients? 

steved.
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
  2014-07-29 18:39 nfs4_state_manager() vs. nfs_server_remove_lists() Steve Dickson
@ 2014-07-29 19:52 ` Trond Myklebust
  2014-07-29 20:40   ` Steve Dickson
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2014-07-29 19:52 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list, Andy Adamson

On Tue, Jul 29, 2014 at 2:39 PM, Steve Dickson <SteveD@redhat.com> wrote:
> Hello,
>
> I've been seeing a panic where nfs4_state_manager()
> ends up processing an v3 nfs client pointer.
>
> The panic happens at the top of nfs4_state_manager()
> because clp->cl_mvops == NULL;
>
> Looking at the pointer (via crash) it becomes obvious
> it is  a V3 client point (AKA rpc_ops = nfs_v3_clientop)
>
> Now the reason we are in the state manager code is a NFSv4
> mount doing server discovery so it is waking the client list
> in nfs41_walk_client_list()
>
> Now looking at the at the entire stack with crash, the
> only time that v3 client pointer appears is after
> nfs41_walk_client_list() has been called so I'm 99%
> sure the pointer is coming from the cl_share_link list.
>
> So the question is how is that v3 client pointer on that
> list, in non NFS_CS_READY state.
>
> Well, simultaneously a V3 mount is happening. In nfs_fs_mount_common()
> it notices there is already a existing supper block sit decides to
> free its server pointer so nfs_server_remove_lists() is called.
>
> What  nfs_server_remove_lists() and nfs41_walk_client_list()
> have in common is the nfs_client_lock spin lock.
>
> Also the client pointer in the server pointer being freed is
> in a non NFS_CS_READY state
>
> To answer the question, the v3 client pointer, in a non
> NFS_CS_READY state, is found by nfs41_walk_client_list()
> because it beat nfs_server_remove_lists() to the
> nfs_client_lock spin lock.
>
> nfs41_walk_client_list() finds the uninitialized client
> pointer nfs_server_remove_lists() is trying to free and
> processes it and then fall over...
>
> Note this was very hard to reproduce since a very large client
> (many cores) is needed and a very fast server and a few
> hours...
>
> Question, since both v3 and v4 clients are on the cl_share_link
> list should there be a check in nfs41_walk_client_list() to
> process only v4 clients?
>

Hi Steve,

Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
"pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
new->cl_proto" so that they all happen before we try to test the value
of cl_cons_state.
As far as I can tell, all those values are guaranteed to be set as
part of the struct nfs_client allocators, before we ever put the
result on the cl_share_link list.

Cheers
  Trond

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
  2014-07-29 19:52 ` Trond Myklebust
@ 2014-07-29 20:40   ` Steve Dickson
  2014-07-29 21:58     ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Dickson @ 2014-07-29 20:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing list, Andy Adamson

On 29/07/14 15:52, Trond Myklebust wrote:
> Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
> "pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
> new->cl_proto" so that they all happen before we try to test the value
> of cl_cons_state.
> As far as I can tell, all those values are guaranteed to be set as
> part of the struct nfs_client allocators, before we ever put the
> result on the cl_share_link list.

The check for 
   if (pos->cl_cons_state > NFS_CS_READY)

then right after that check is:

   if (pos->cl_cons_state != NFS_CS_READY)
         continue;

confuses me... Is the second check even needed? 

steved.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
  2014-07-29 20:40   ` Steve Dickson
@ 2014-07-29 21:58     ` Trond Myklebust
       [not found]       ` <CADnza45-ytG-0GcfS3Q0SczekP9M+F9z5EJA0dMsbhD3V=d2Gg@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2014-07-29 21:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list, Andy Adamson

On Tue, Jul 29, 2014 at 4:40 PM, Steve Dickson <SteveD@redhat.com> wrote:
> On 29/07/14 15:52, Trond Myklebust wrote:
>> Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
>> "pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
>> new->cl_proto" so that they all happen before we try to test the value
>> of cl_cons_state.
>> As far as I can tell, all those values are guaranteed to be set as
>> part of the struct nfs_client allocators, before we ever put the
>> result on the cl_share_link list.
>
> The check for
>    if (pos->cl_cons_state > NFS_CS_READY)
>
> then right after that check is:
>
>    if (pos->cl_cons_state != NFS_CS_READY)
>          continue;
>
> confuses me... Is the second check even needed?
>
> steved.

Yes. The result of the lease_recovery could be that the nfs_client is
left in a state of error if, say, we get a NFS4ERR_CLID_INUSE beastie.

Cheers
  Trond

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
       [not found]       ` <CADnza45-ytG-0GcfS3Q0SczekP9M+F9z5EJA0dMsbhD3V=d2Gg@mail.gmail.com>
@ 2014-09-16 19:28         ` Trond Myklebust
  2014-09-17 12:59           ` Steve Dickson
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2014-09-16 19:28 UTC (permalink / raw)
  To: Fred Isaman; +Cc: Steve Dickson, Linux NFS Mailing list, Andy Adamson

On Tue, Sep 16, 2014 at 2:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
> Was a patch ever submitted for this?  I'm seeing something similar but can't
> find the fix upstream.

As far as I can tell, the answer is no. If you are seeing the bug,
then could you please post the discussed fix?

Cheers
  TRond

>
> Fred
>
> On Tue, Jul 29, 2014 at 5:58 PM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>>
>> On Tue, Jul 29, 2014 at 4:40 PM, Steve Dickson <SteveD@redhat.com> wrote:
>> > On 29/07/14 15:52, Trond Myklebust wrote:
>> >> Let's just move up the test for "pos->rpc_ops != new->rpc_ops",
>> >> "pos->cl_minorversion != new->cl_minorversion" and "pos->cl_proto !=
>> >> new->cl_proto" so that they all happen before we try to test the value
>> >> of cl_cons_state.
>> >> As far as I can tell, all those values are guaranteed to be set as
>> >> part of the struct nfs_client allocators, before we ever put the
>> >> result on the cl_share_link list.
>> >
>> > The check for
>> >    if (pos->cl_cons_state > NFS_CS_READY)
>> >
>> > then right after that check is:
>> >
>> >    if (pos->cl_cons_state != NFS_CS_READY)
>> >          continue;
>> >
>> > confuses me... Is the second check even needed?
>> >
>> > steved.
>>
>> Yes. The result of the lease_recovery could be that the nfs_client is
>> left in a state of error if, say, we get a NFS4ERR_CLID_INUSE beastie.
>>
>> Cheers
>>   Trond
>>
>> --
>> Trond Myklebust
>>
>> Linux NFS client maintainer, PrimaryData
>>
>> trond.myklebust@primarydata.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfs4_state_manager() vs. nfs_server_remove_lists()
  2014-09-16 19:28         ` Trond Myklebust
@ 2014-09-17 12:59           ` Steve Dickson
  0 siblings, 0 replies; 6+ messages in thread
From: Steve Dickson @ 2014-09-17 12:59 UTC (permalink / raw)
  To: Trond Myklebust, Fred Isaman; +Cc: Linux NFS Mailing list, Andy Adamson

Hello,

On 09/16/2014 03:28 PM, Trond Myklebust wrote:
> On Tue, Sep 16, 2014 at 2:51 PM, Fred Isaman <iisaman@netapp.com> wrote:
>> > Was a patch ever submitted for this?  I'm seeing something similar but can't
>> > find the fix upstream.
> As far as I can tell, the answer is no. If you are seeing the bug,
> then could you please post the discussed fix?

It looks like I dropped the ball... I too am see this problem
and can not find the posted patch... Working on it.... 

steved.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-09-17 12:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-29 18:39 nfs4_state_manager() vs. nfs_server_remove_lists() Steve Dickson
2014-07-29 19:52 ` Trond Myklebust
2014-07-29 20:40   ` Steve Dickson
2014-07-29 21:58     ` Trond Myklebust
     [not found]       ` <CADnza45-ytG-0GcfS3Q0SczekP9M+F9z5EJA0dMsbhD3V=d2Gg@mail.gmail.com>
2014-09-16 19:28         ` Trond Myklebust
2014-09-17 12:59           ` Steve Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox