From: Zhang Jingwang <yyalone@gmail.com>
To: Benny Halevy <bhalevy@panasas.com>
Cc: NFS list <linux-nfs@vger.kernel.org>
Subject: Re: SQUASHME: missing from FIXME: async layout return
Date: Fri, 14 May 2010 11:28:46 +0800 [thread overview]
Message-ID: <AANLkTikG-QLid-R4Zt0UJ29bbimpYXjA8XEQ0AVNxMhr@mail.gmail.com> (raw)
In-Reply-To: <4BEC26BC.2040900@panasas.com>
2010/5/14 Benny Halevy <bhalevy@panasas.com>:
> On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <androsada=
mson@gmail.com> wrote:
>> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com=
> wrote:
>>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy-C4P08NqkoRmIwRZHo2/mJg@public.gmane.org=
m> wrote:
>>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <andro=
sadamson@gmail.com> wrote:
>>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.=
com> wrote:
>>>>>>> I've tested the patch:
>>>>>>> =A0 =A0 =A0 =A0FIXME: async layout return
>>>>>>>
>>>>>>> And there is a missing small hunk
>>>>>>>
>>>>>>> I have tested with this patch and it is a very good patch
>>>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>>>> case when one inode have more then one open_context.
>>>>>>
>>>>>> Do you mean more than one open context per open owner?
>>>>>
>>>>> What we see is one "regular" open context and one which is the la=
yout_commit_ctx
>>>>
>>>> Isn't that a BUG?
>>>>
>>>> Here is what we're seeing:
>>>>
>>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_=
nfs_open_context->
>>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pn=
fs_layoutcommit_inode,
>>>>
>>>> This is the same code that Boaz's 'missing small hunk' adds the wa=
it
>>>> to the pnfs_layout_commit_inode to. This is good, because when
>>>> __nfs4_close is called with sync, every thing must be sent/returne=
d
>>>> prior to the nfs_do_close call.
>>>>
>>>> But there is still a problem. Here is the __nfs4_close call (with =
out
>>>> the wait that Boaz added)
>>>>
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nfsi->layoutcommit_ctx)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_=
inode(state->inode, 0);
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (has_layout(nfsi) && nfsi->layo=
ut.roc_iomode) {
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_pnfs_l=
ayout_segment range;
>>>>
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.iomode =3D n=
fsi->layout.roc_iomode;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.offset =3D 0=
;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.length =3D N=
=46S4_MAX_UINT64;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_return_layout=
(state->inode, &range, NULL,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0RETURN_FILE);
>>>>
>>>> Note that a pnfs_return_layout which is always 'sync' (async with =
wait
>>>> for completion). So the (currently) async LAYOUTCOMMIT call return=
s,
>>>> and a LAYOUTRETURN is put on the wire
>>>>
>>>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the op=
en
>>>> context is different from the open context that was put by
>>>> nfs_file_release, then
>>>>
>>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_clos=
e
>>>> and the return on close LAYOUTRETURN is sent again!
>>>>
>>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or u=
ses
>>>> the same stateid as the first LAYOUTRETURN, and the reply to the
>>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>>>
>>>> This will occur whether the LAYOUTCOMMIT is async or sync as they =
both
>>>> call pnfs_layoutcommit_done.
>>>>
>>>> I need to understand why there are two open contexts. On the face =
of
>>>> it, it seems wrong.
>>>>
>>>> We also add a pointer to the open context in the nfs_write_data, a=
nd
>>>> in the pnfs_layoutcommit_data. Do we take a reference on the open =
data
>>>> in theses cases? .
>>>>
>>>> -->Andy
>>>>
>>>>
>>>
>>> Yes on all acounts. This is what Benny's patch is fixing please try=
it out
>>> it is most important (Add my small fix)
>>
>> Is this what you mean?
>>
>> With Benny's patch, with return-on-close set and the layoutcommit_ct=
x
>> different from the open context that caused the __nfs4_close:
>>
>> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
>> __nfs4_close() will complete prior to calling the return-on-close
>> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
>> put_nfs_open_context will be call on the layoutconmit_ctx (saved in
>> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.
>>
>> In _this_ instance of calling __nfs4_close (from
>> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null =
so
>> the pnfs_layoutcommit_inode call is skipped, and =A0pnfs_return_layo=
ut
>> is called. Since it is also a sync call, pnfs_layoutcommit_done does
>> not return until it is complete. So the layout is returned, the layo=
ut
>> is freed (all layout segments 'cause the range covers the whole file=
)
>> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
>> nfs_file_release)
>>
>> Now since return-on-close is set, pnfs_layout_return is called (in t=
he
>> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
>> on the wire because the first pnfs_layout_return (from the
>> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
>> and all is well.
>>
>> Sheese.
>
> Yeah :-/
> I'm feeling increasingly uncomfortable with the current implementatio=
n
> and we're planning to come up with the state-machine based model
> for the Bakeathon and test it there. =A0Ideally, this will be stable =
enough
> that we can get the gist of it into pnfs-submit.
>
> Benny
>
Is there any document about the state-machine based model?
>>
>> --->Andy
>>
>>
>>>
>>> And about the multiple contexts I don't understand as well but the =
fact of it
>>> is that an nfs_inode has an list_head of open_contexts and we must =
deal properly
>>> with when that happens. (BTW with 2.6.34 Kernel it happens regularl=
y as in
>>> prev kernels it almost never trigger.)
>>>
>>> Benny's patch takes care of that and I've done heavy testing of it =
I can see cases
>>> of more then one contexts and it works correctly.
>>>
>>> But I too would like to understand when is it that an inode can hav=
e more then
>>> one open_context
>>>
>>> Boaz
>>>
>>>>
>>>>>
>>>>> Benny
>>>>>
>>>>>>
>>>>>> -->Andy
>>>>>>
>>>>>>>
>>>>>>> (For some reason I see that happening much more in 2.6.34
>>>>>>> =A0I don't understand why)
>>>>>>>
>>>>>>> Boaz
>>>>>>> ---
>>>>>>> git diff --stat -p -M
>>>>>>> =A0fs/nfs/nfs4state.c | =A0 =A02 +-
>>>>>>> =A01 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>>> index 15c8bc8..6dbe893 100644
>>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path,=
struct nfs4_state *state, fmode_t fm
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I=
(state->inode);
>>>>>>>
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx)
>>>>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit=
_inode(state->inode, 0);
>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit=
_inode(state->inode, wait);
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->la=
yout.roc_iomode) {
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs=
_layout_segment range;
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux=
-nfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info=
=2Ehtml
>>>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nf=
s" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
>>>>
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--=20
Zhang Jingwang
National Research Centre for High Performance Computers
Institute of Computing Technology, Chinese Academy of Sciences
No. 6, South Kexueyuan Road, Haidian District
Beijing, China
next prev parent reply other threads:[~2010-05-14 3:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-13 8:40 SQUASHME: missing from FIXME: async layout return Boaz Harrosh
2010-05-13 14:07 ` William A. (Andy) Adamson
[not found] ` <AANLkTilE4ao7UadJD58EzbExdfwE4zU5DF-6Nit29Yjq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 14:39 ` Benny Halevy
2010-05-13 14:19 ` William A. (Andy) Adamson
[not found] ` <AANLkTind3eXmuPw5HrKqB6bHJSSXFL3g2cDSEyyYc1n8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 14:40 ` Benny Halevy
2010-05-13 15:17 ` William A. (Andy) Adamson
[not found] ` <AANLkTilhbkde-kAPGSBHT1b8vIZq4cM1SPpI-Rd0iOb7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 15:31 ` Boaz Harrosh
2010-05-13 16:04 ` William A. (Andy) Adamson
2010-05-13 16:20 ` Benny Halevy
2010-05-14 3:28 ` Zhang Jingwang [this message]
2010-05-17 6:02 ` Benny Halevy
2010-05-13 16:23 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTikG-QLid-R4Zt0UJ29bbimpYXjA8XEQ0AVNxMhr@mail.gmail.com \
--to=yyalone@gmail.com \
--cc=bhalevy@panasas.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).