linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhang Jingwang <yyalone@gmail.com>
To: Benny Halevy <bhalevy@panasas.com>
Cc: NFS list <linux-nfs@vger.kernel.org>
Subject: Re: SQUASHME: missing from FIXME: async layout return
Date: Fri, 14 May 2010 11:28:46 +0800	[thread overview]
Message-ID: <AANLkTikG-QLid-R4Zt0UJ29bbimpYXjA8XEQ0AVNxMhr@mail.gmail.com> (raw)
In-Reply-To: <4BEC26BC.2040900@panasas.com>

2010/5/14 Benny Halevy <bhalevy@panasas.com>:
> On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <androsada=
mson@gmail.com> wrote:
>> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com=
> wrote:
>>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote:
>>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy-C4P08NqkoRmIwRZHo2/mJg@public.gmane.org=
m> wrote:
>>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <andro=
sadamson@gmail.com> wrote:
>>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.=
com> wrote:
>>>>>>> I've tested the patch:
>>>>>>> =A0 =A0 =A0 =A0FIXME: async layout return
>>>>>>>
>>>>>>> And there is a missing small hunk
>>>>>>>
>>>>>>> I have tested with this patch and it is a very good patch
>>>>>>> that should also go into 2.6.33. It is necessary in the rare
>>>>>>> case when one inode have more then one open_context.
>>>>>>
>>>>>> Do you mean more than one open context per open owner?
>>>>>
>>>>> What we see is one "regular" open context and one which is the la=
yout_commit_ctx
>>>>
>>>> Isn't that a BUG?
>>>>
>>>> Here is what we're seeing:
>>>>
>>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_=
nfs_open_context->
>>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pn=
fs_layoutcommit_inode,
>>>>
>>>> This is the same code that Boaz's 'missing small hunk' adds the wa=
it
>>>> to the pnfs_layout_commit_inode to. This is good, because when
>>>> __nfs4_close is called with sync, every thing must be sent/returne=
d
>>>> prior to the nfs_do_close call.
>>>>
>>>> But there is still a problem. Here is the __nfs4_close call (with =
out
>>>> the wait that Boaz added)
>>>>
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nfsi->layoutcommit_ctx)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_=
inode(state->inode, 0);
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (has_layout(nfsi) && nfsi->layo=
ut.roc_iomode) {
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_pnfs_l=
ayout_segment range;
>>>>
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.iomode =3D n=
fsi->layout.roc_iomode;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.offset =3D 0=
;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.length =3D N=
=46S4_MAX_UINT64;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_return_layout=
(state->inode, &range, NULL,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0RETURN_FILE);
>>>>
>>>> Note that a pnfs_return_layout which is always 'sync' (async with =
wait
>>>> for completion). So the (currently) async LAYOUTCOMMIT call return=
s,
>>>> and a LAYOUTRETURN is put on the wire
>>>>
>>>> Then the LAYOUTCOMMIT rpc_call_done routine calls
>>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the op=
en
>>>> context is different from the open context that was put by
>>>> nfs_file_release, then
>>>>
>>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_clos=
e
>>>> and the return on close LAYOUTRETURN is sent again!
>>>>
>>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or u=
ses
>>>> the same stateid as the first LAYOUTRETURN, and the reply to the
>>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID .....
>>>>
>>>> This will occur whether the LAYOUTCOMMIT is async or sync as they =
both
>>>> call pnfs_layoutcommit_done.
>>>>
>>>> I need to understand why there are two open contexts. On the face =
of
>>>> it, it seems wrong.
>>>>
>>>> We also add a pointer to the open context in the nfs_write_data, a=
nd
>>>> in the pnfs_layoutcommit_data. Do we take a reference on the open =
data
>>>> in theses cases? .
>>>>
>>>> -->Andy
>>>>
>>>>
>>>
>>> Yes on all acounts. This is what Benny's patch is fixing please try=
 it out
>>> it is most important (Add my small fix)
>>
>> Is this what you mean?
>>
>> With Benny's patch, with return-on-close set and the layoutcommit_ct=
x
>> different from the open context that caused the __nfs4_close:
>>
>> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release
>> __nfs4_close() will complete prior to calling the return-on-close
>> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called,
>> put_nfs_open_context will be call on the layoutconmit_ctx (saved in
>> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called.
>>
>> In _this_ instance of calling __nfs4_close (from
>> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null =
so
>> the pnfs_layoutcommit_inode call is skipped, and =A0pnfs_return_layo=
ut
>> is called. Since it is also a sync call, pnfs_layoutcommit_done does
>> not return until it is complete. So the layout is returned, the layo=
ut
>> is freed (all layout segments 'cause the range covers the whole file=
)
>> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from
>> nfs_file_release)
>>
>> Now since return-on-close is set, pnfs_layout_return is called (in t=
he
>> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go
>> on the wire because the first pnfs_layout_return (from the
>> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment,
>> and all is well.
>>
>> Sheese.
>
> Yeah :-/
> I'm feeling increasingly uncomfortable with the current implementatio=
n
> and we're planning to come up with the state-machine based model
> for the Bakeathon and test it there. =A0Ideally, this will be stable =
enough
> that we can get the gist of it into pnfs-submit.
>
> Benny
>

Is there any document about the state-machine based model?
>>
>> --->Andy
>>
>>
>>>
>>> And about the multiple contexts I don't understand as well but the =
fact of it
>>> is that an nfs_inode has an list_head of open_contexts and we must =
deal properly
>>> with when that happens. (BTW with 2.6.34 Kernel it happens regularl=
y as in
>>> prev kernels it almost never trigger.)
>>>
>>> Benny's patch takes care of that and I've done heavy testing of it =
I can see cases
>>> of more then one contexts and it works correctly.
>>>
>>> But I too would like to understand when is it that an inode can hav=
e more then
>>> one open_context
>>>
>>> Boaz
>>>
>>>>
>>>>>
>>>>> Benny
>>>>>
>>>>>>
>>>>>> -->Andy
>>>>>>
>>>>>>>
>>>>>>> (For some reason I see that happening much more in 2.6.34
>>>>>>> =A0I don't understand why)
>>>>>>>
>>>>>>> Boaz
>>>>>>> ---
>>>>>>> git diff --stat -p -M
>>>>>>> =A0fs/nfs/nfs4state.c | =A0 =A02 +-
>>>>>>> =A01 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>>> index 15c8bc8..6dbe893 100644
>>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path,=
 struct nfs4_state *state, fmode_t fm
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I=
(state->inode);
>>>>>>>
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx)
>>>>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit=
_inode(state->inode, 0);
>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit=
_inode(state->inode, wait);
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->la=
yout.roc_iomode) {
>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs=
_layout_segment range;
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux=
-nfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info=
=2Ehtml
>>>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nf=
s" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
>>>>
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Zhang Jingwang
National Research Centre for High Performance Computers
Institute of Computing Technology, Chinese Academy of Sciences
No. 6, South Kexueyuan Road, Haidian District
Beijing, China

  reply	other threads:[~2010-05-14  3:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-13  8:40 SQUASHME: missing from FIXME: async layout return Boaz Harrosh
2010-05-13 14:07 ` William A. (Andy) Adamson
     [not found]   ` <AANLkTilE4ao7UadJD58EzbExdfwE4zU5DF-6Nit29Yjq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 14:39     ` Benny Halevy
2010-05-13 14:19 ` William A. (Andy) Adamson
     [not found]   ` <AANLkTind3eXmuPw5HrKqB6bHJSSXFL3g2cDSEyyYc1n8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 14:40     ` Benny Halevy
2010-05-13 15:17       ` William A. (Andy) Adamson
     [not found]         ` <AANLkTilhbkde-kAPGSBHT1b8vIZq4cM1SPpI-Rd0iOb7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 15:31           ` Boaz Harrosh
2010-05-13 16:04             ` William A. (Andy) Adamson
2010-05-13 16:20               ` Benny Halevy
2010-05-14  3:28                 ` Zhang Jingwang [this message]
2010-05-17  6:02                   ` Benny Halevy
2010-05-13 16:23               ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikG-QLid-R4Zt0UJ29bbimpYXjA8XEQ0AVNxMhr@mail.gmail.com \
    --to=yyalone@gmail.com \
    --cc=bhalevy@panasas.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).