From: Jiufei Xue <jiufei.xue@linux.alibaba.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
"aglo@umich.edu" <aglo@umich.edu>
Cc: "bfields@fieldses.org" <bfields@fieldses.org>,
"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"joseph.qi@linux.alibaba.com" <joseph.qi@linux.alibaba.com>
Subject: Re: [bug report] task hang while testing xfstests generic/323
Date: Fri, 1 Mar 2019 13:19:54 +0800 [thread overview]
Message-ID: <c4b68217-275d-d40c-1b7e-71fddf91f830@linux.alibaba.com> (raw)
In-Reply-To: <dae18b965a55ed36071b5296d6b1466a57878d16.camel@hammerspace.com>
On 2019/3/1 上午7:56, Trond Myklebust wrote:
> On Thu, 2019-02-28 at 17:26 -0500, Olga Kornievskaia wrote:
>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
>> jiufei.xue@linux.alibaba.com> wrote:
>>> Hi,
>>>
>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
>>> changed to zombie occasionally while a thread is hanging with the
>>> following stack:
>>>
>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>> [<0>] __fput+0xa2/0x1c0
>>> [<0>] task_work_run+0x82/0xa0
>>> [<0>] do_exit+0x2ac/0xc20
>>> [<0>] do_group_exit+0x39/0xa0
>>> [<0>] get_signal+0x1ce/0x5d0
>>> [<0>] do_signal+0x36/0x620
>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>> [<0>] do_syscall_64+0x16c/0x190
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>>
>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>> NFS4ERR_OLD_STATEID), the client will retry to close the file when
>>> stateid generation number in client is lower than server.
>>>
>>> The original intention of this commit is retrying the operation
>>> while
>>> racing with an OPEN. However, in this case the stateid generation
>>> remains
>>> mismatch forever.
>>>
>>> Any suggestions?
>>
>> Can you include a network trace of the failure? Is it possible that
>> the server has crashed on reply to the close and that's why the task
>> is hung? What server are you testing against?
>>
>> I have seen trace where close would get ERR_OLD_STATEID and would
>> still retry with the same open state until it got a reply to the OPEN
>> which changed the state and when the client received reply to that,
>> it'll retry the CLOSE with the updated stateid.
>
> I agree with Olga's assessment. The server is not allowed to randomly
> change the values of the seqid, and the client should be taking pains
> to replay any OPEN calls for which a reply is missed. The expectation
> is therefore that NFS4ERR_OLD_STATEID should always be a temporary
> state.
>
The server bumped the seqid because of a new OPEN from another thread.
And I doubt that maybe the new OPEN task exit while receiving a signal
without update the stateid.
> If it is not, then the bugreport needs to explain why the server bumped
> the seqid without informing the client.
>
next prev parent reply other threads:[~2019-03-01 5:20 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-28 10:10 [bug report] task hang while testing xfstests generic/323 Jiufei Xue
2019-02-28 22:26 ` Olga Kornievskaia
2019-02-28 23:56 ` Trond Myklebust
2019-03-01 5:19 ` Jiufei Xue [this message]
2019-03-01 5:08 ` Jiufei Xue
2019-03-01 8:49 ` Jiufei Xue
2019-03-01 13:08 ` Trond Myklebust
2019-03-02 16:34 ` Jiufei Xue
2019-03-04 15:20 ` Jiufei Xue
2019-03-04 15:50 ` Trond Myklebust
2019-03-05 5:09 ` Jiufei Xue
2019-03-05 14:45 ` Trond Myklebust
2019-03-06 9:59 ` Jiufei Xue
2019-03-06 16:09 ` bfields
2019-03-10 22:20 ` Olga Kornievskaia
2019-03-11 14:30 ` Trond Myklebust
2019-03-11 15:07 ` Olga Kornievskaia
2019-03-11 15:13 ` Olga Kornievskaia
2019-03-15 6:30 ` Jiufei Xue
2019-03-15 20:33 ` Olga Kornievskaia
2019-03-15 20:55 ` Trond Myklebust
2019-03-16 14:11 ` Jiufei Xue
2019-03-19 15:33 ` Olga Kornievskaia
2019-03-11 15:12 ` Trond Myklebust
2019-03-11 15:14 ` Olga Kornievskaia
2019-03-11 15:28 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4b68217-275d-d40c-1b7e-71fddf91f830@linux.alibaba.com \
--to=jiufei.xue@linux.alibaba.com \
--cc=Anna.Schumaker@netapp.com \
--cc=aglo@umich.edu \
--cc=bfields@fieldses.org \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox