From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail.candelatech.com ([208.74.158.172]:40075 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755006Ab3ARWeJ (ORCPT ); Fri, 18 Jan 2013 17:34:09 -0500 Message-ID: <50F9CDDF.4070609@candelatech.com> Date: Fri, 18 Jan 2013 14:34:07 -0800 From: Ben Greear MIME-Version: 1.0 To: Chuck Lever CC: "linux-nfs@vger.kernel.org" Subject: Re: Question on nfs40_discover_server_trunking. References: <50F9BE66.6080608@candelatech.com> <0F001F0E-229D-4314-A42E-84402E4F1FC7@oracle.com> <50F9C5B4.5020000@candelatech.com> <1A56CC87-38FF-47B4-9CA9-7BAE394AF0D2@oracle.com> In-Reply-To: <1A56CC87-38FF-47B4-9CA9-7BAE394AF0D2@oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 01/18/2013 02:29 PM, Chuck Lever wrote: > > On Jan 18, 2013, at 4:59 PM, Ben Greear wrote: > >> On 01/18/2013 01:33 PM, Chuck Lever wrote: >>> >>> On Jan 18, 2013, at 4:28 PM, Ben Greear wrote: >>> >>>> Any chance the STALE_CLIENTID case needs a 'break'? >>> >>> I don't think so. LEASE_CONFIRM is set, and we want to wake the state renewal thread. >>> >>>> >>>> Twice I've seen kernel crashes after the nfs40_walk_client_list >>>> failed (though code comments say it should never fail). >>> >>> nfs40_walk_client_list() is looking for an nfs_client that is supposed to already be in the nfs_client list. If the search fails, that's a bug. >>> >>> Eyeball the contents of your nfs_client list. You should find an appropriate nfs_client in there, and then figure out why the search doesn't find it. >> >> Ok, I think I found another issue. >> >> nfs4_client_init does not initialize 'old', but passes it to nfs4_discover_server_trunking. >> >> That gets passed to the detect_trunking operation, which is nfs40_discover_server_trunking >> or nfs41_discover_server_trunking. > > Yes, *result is an output parameter, not an input parameter. > >> This will call walk_client_list, which also may not ever assign a value to >> 'result'. > > It should always assign *result in the SUCCESS case. > >> The code in walk_client_list always dereferences result, however. >> >> So, that is probably why my system blows up shortly after the 'impossible' >> error message... > > In particular, nfs4_walk_client_list() does not set *result when it returns NFS4ERR_STALE_CLIENTID. In nfs4_discover_server_trunking(), should we therefore have this: > > status = nfs40_walk_client_list(clp, result, cred); > switch (status) { > case -NFS4ERR_STALE_CLIENTID: > set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state); >>> nfs4_schedule_state_renewal(clp); >>> break; > case 0: > > I'm not sure the nfs4_schedule_state_renewal() call is necessary here. I'm not sure I follow you entirely..but I'm going to start testing this patch in a minute. Looks like it should fix my crash, and maybe give clues about why we get to the error case in the first place. diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index d6b39a9..3188283 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -307,6 +307,8 @@ int nfs40_walk_client_list(struct nfs_client *new, }; int status; + *result = NULL; + spin_lock(&nn->nfs_client_lock); list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) { /* If "pos" isn't marked ready, we can't trust the @@ -364,6 +366,7 @@ int nfs40_walk_client_list(struct nfs_client *new, nfs_put_client(prev); spin_unlock(&nn->nfs_client_lock); pr_err("NFS: %s Error: no matching nfs_client found\n", __func__); + WARN_ON(1); return -NFS4ERR_STALE_CLIENTID; } @@ -433,6 +436,8 @@ int nfs41_walk_client_list(struct nfs_client *new, struct nfs_client *pos, *n, *prev = NULL; int error; + *result = NULL; + spin_lock(&nn->nfs_client_lock); list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) { /* If "pos" isn't marked ready, we can't trust the @@ -489,6 +494,7 @@ int nfs41_walk_client_list(struct nfs_client *new, */ spin_unlock(&nn->nfs_client_lock); pr_err("NFS: %s Error: no matching nfs_client found\n", __func__); + WARN_ON(1); return -NFS4ERR_STALE_CLIENTID; } #endif /* CONFIG_NFS_V4_1 */ diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index c351e6b..54d29e2 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -142,7 +142,8 @@ int nfs40_discover_server_trunking(struct nfs_client *clp, case 0: /* Sustain the lease, even if it's empty. If the clientid4 * goes stale it's of no use for trunking discovery. */ - nfs4_schedule_state_renewal(*result); + if (*result) + nfs4_schedule_state_renewal(*result); break; } -- Ben Greear Candela Technologies Inc http://www.candelatech.com