From: Trond Myklebust <trondmy@hammerspace.com>
To: "bfields@fieldses.org" <bfields@fieldses.org>,
"aglo@umich.edu" <aglo@umich.edu>
Cc: "jiufei.xue@linux.alibaba.com" <jiufei.xue@linux.alibaba.com>,
"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"joseph.qi@linux.alibaba.com" <joseph.qi@linux.alibaba.com>
Subject: Re: [bug report] task hang while testing xfstests generic/323
Date: Mon, 11 Mar 2019 14:30:40 +0000 [thread overview]
Message-ID: <ace3659edd3efb70d3d978e108ecf1fc22ee3966.camel@hammerspace.com> (raw)
In-Reply-To: <CAN-5tyEPSDKao=83sW+voU9SPbcszZyR+XfZzcsD2-ozVa22Xg@mail.gmail.com>
Hi Olga,
On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> There are a bunch of cases where multiple operations are using the
> same seqid and slot.
>
> Example of such weirdness is (nfs.seqid == 0x000002f4) && (nfs.slotid
> == 0) and the one leading to the hang.
>
> In frame 415870, there is an OPEN using that seqid and slot for the
> first time (but this slot will be re-used a bunch of times before it
> gets a reply in frame 415908 with the open stateid seq=40). (also in
> this packet there is an example of reuse slot=1+seqid=0x000128f7 by
> both TEST_STATEID and OPEN but let's set that aside).
>
> In frame 415874 (in the same packet), client sends 5 opens on the
> SAME
> seqid and slot (all have distinct xids). In a ways that's end up
> being
> alright since opens are for the same file and thus reply out of the
> cache and the reply is ERR_DELAY. But in frame 415876, client sends
> again uses the same seqid and slot and in this case it's used by
> 3opens and a test_stateid.
>
> Client in all this mess never processes the open stateid seq=40 and
> keeps on resending CLOSE with seq=37 (also to note client "missed"
> processing seqid=38 and 39 as well. 39 probably because it was a
> reply
> on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
> tracked 38 but i'm assuming it's the same). I don't know how many
> times but after 5mins, I see a TEST_STATEID that again uses the same
> seqid+slot (which gets a reply from the cache matching OPEN). Also
> open + close (still with seq=37) open is replied to but after this
> client goes into a soft lockup logs have
> "nfs4_schedule_state_manager:
> kthread_ruan: -4" over and over again . then a soft lockup.
>
> Looking back on slot 0. nfs.seqid=0x000002f3 was used in frame=415866
> by the TEST_STATEID. This is replied to in frame 415877 (with an
> ERR_DELAY). But before the client got a reply, it used the slot and
> the seq by frame 415874. TEST_STATEID is a synchronous and
> interruptible operation. I'm suspecting that somehow it was
> interrupted and that's who the slot was able to be re-used by the
> frame 415874. But how the several opens were able to get the same
> slot
> I don't know..
Is this still true with the current linux-next? I would expect this
patch
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
to change the Linux client behaviour in the above regard.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2019-03-11 14:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-28 10:10 [bug report] task hang while testing xfstests generic/323 Jiufei Xue
2019-02-28 22:26 ` Olga Kornievskaia
2019-02-28 23:56 ` Trond Myklebust
2019-03-01 5:19 ` Jiufei Xue
2019-03-01 5:08 ` Jiufei Xue
2019-03-01 8:49 ` Jiufei Xue
2019-03-01 13:08 ` Trond Myklebust
2019-03-02 16:34 ` Jiufei Xue
2019-03-04 15:20 ` Jiufei Xue
2019-03-04 15:50 ` Trond Myklebust
2019-03-05 5:09 ` Jiufei Xue
2019-03-05 14:45 ` Trond Myklebust
2019-03-06 9:59 ` Jiufei Xue
2019-03-06 16:09 ` bfields
2019-03-10 22:20 ` Olga Kornievskaia
2019-03-11 14:30 ` Trond Myklebust [this message]
2019-03-11 15:07 ` Olga Kornievskaia
2019-03-11 15:13 ` Olga Kornievskaia
2019-03-15 6:30 ` Jiufei Xue
2019-03-15 20:33 ` Olga Kornievskaia
2019-03-15 20:55 ` Trond Myklebust
2019-03-16 14:11 ` Jiufei Xue
2019-03-19 15:33 ` Olga Kornievskaia
2019-03-11 15:12 ` Trond Myklebust
2019-03-11 15:14 ` Olga Kornievskaia
2019-03-11 15:28 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ace3659edd3efb70d3d978e108ecf1fc22ee3966.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=Anna.Schumaker@netapp.com \
--cc=aglo@umich.edu \
--cc=bfields@fieldses.org \
--cc=jiufei.xue@linux.alibaba.com \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox