From: NeilBrown <neilb@suse.com>
To: Olga Kornievskaia <aglo@umich.edu>,
Trond Myklebust <trondmy@primarydata.com>
Cc: Adamson William <andros@netapp.com>,
Schumaker Anna <anna.schumaker@netapp.com>,
List Linux NFS Mailing <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
Date: Sat, 19 Nov 2016 17:33:12 +1100 [thread overview]
Message-ID: <87wpg0dlnb.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <CAN-5tyH+4=9y9a3v0po5-cW+5ABSp5Z1pXN=+eTZSnd0KORk2Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3744 bytes --]
On Sat, Nov 19 2016, Olga Kornievskaia wrote:
> On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust
> <trondmy@primarydata.com> wrote:
>>
>>> On Nov 18, 2016, at 00:01, NeilBrown <neilb@suse.com> wrote:
>>>
>>>
>>> Ping?
>>> No sign of this in linux-next, and no replies….
>>
>> I’d like some confirmation from the NetApp folks on this. They are the main stakeholders for the pNFS files implementation. I don’t think we need an equivalent for flex files.
>
> Just to see if I understand this patch. If "ds" isn't NULL, then the
> client has an RPC client with the ds. But it doesn't have a valid NFS
> client? Looking at the code I really don't see how that can happen.
I thought I explained that, but I was rather brief.
nfs4_fl_prepare_ds() calls nfs4_pnfs_ds_connect() in order to create the
NFS client. The first time it is called it will set NFS4DS_CONNECTING
and then call _nfs4_pnfs_v?_ds_connect() which will establish the
connection and set ds->ds_clp. If the server is unresponsive, this an
block indefinitely.
If a second thread then tries the same time, it will enter
nfs4_pnfs_ds_connect() and will discovery that NFS4DS_CONNECTING is
already set, so it will call nfs4_wait_ds_connect().
As nfs4_wait_ds_connect() waits with TASK_KILLABLE, it will abort if the
process is kill by an uncaught signal (such as SIGKILL).
In this case it will return even though NFS4DS_CONNECTING is still set,
and ds->ds_clp is still NULL. i.e. the first thread hasn't established
the connection yet.
As the process has been killed, the 'ds' that it holds that doesn't have
an NFS client handle will never be used. But we need to be sure that
the process will exit cleanly without trying to de-reference that NULL
ds_clp.
That is what the patch ensures.
> Maybe there is a bug elsewhere? If we return here, is there a chance
> we are going to have a zombie RPC client with the ds? If that's not a
> problem then I don't think there an issue to assume that if there is
> no valid NFS client then we would want to return a NULL from
> nfs4_fl_prepare_ds().
It isn't a "zombie RPC client", but rather an unborn RPC client, which
still has NFS4DS_CONNECTING set.
Thanks,
NeilBrown
>
>
>>
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>>
>>> On Fri, Oct 21 2016, NeilBrown wrote:
>>>
>>>>
>>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
>>>> 'ds', then ds->ds_clp will also be non-NULL.
>>>>
>>>> This is not necessasrily true in the case when the process received a
>>>> fatal signal while nfs4_pnfs_ds_connect is waiting in
>>>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the
>>>> devid may not recently have been marked unavailable.
>>>>
>>>> So add a test for ->ds_clp == NULL and return NULL in that case.
>>>>
>>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
>>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>>> ---
>>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
>>>> index 4946ef40ba87..85ef38f9765f 100644
>>>> --- a/fs/nfs/filelayout/filelayoutdev.c
>>>> +++ b/fs/nfs/filelayout/filelayoutdev.c
>>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
>>>> s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
>>>>
>>>> out_test_devid:
>>>> - if (filelayout_test_devid_unavailable(devid))
>>>> + if (ret->ds_clp == NULL ||
>>>> + filelayout_test_devid_unavailable(devid))
>>>> ret = NULL;
>>>> out:
>>>> return ret;
>>>> --
>>>> 2.10.1
>>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]
next prev parent reply other threads:[~2016-11-19 6:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-21 0:01 [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success NeilBrown
2016-11-18 5:01 ` NeilBrown
2016-11-18 14:32 ` Trond Myklebust
2016-11-18 17:34 ` Olga Kornievskaia
2016-11-19 6:33 ` NeilBrown [this message]
2016-11-22 20:32 ` Adamson, Andy
2016-11-23 17:37 ` Olga Kornievskaia
2016-12-19 0:19 ` [PATCH resend] " NeilBrown
2016-12-19 0:28 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wpg0dlnb.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=aglo@umich.edu \
--cc=andros@netapp.com \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).