linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Javorski <mike.javorski@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
Date: Sat, 14 Aug 2021 18:23:46 -0700	[thread overview]
Message-ID: <CAOv1SKDDOj5UeUwztrMSNJnLgSoEgD8OU55hqtLHffHvaCQzzA@mail.gmail.com> (raw)
In-Reply-To: <CAOv1SKB_dsam7P9pzzh_SKCtA8uE9cyFdJ=qquEfhLT42-szPA@mail.gmail.com>

I managed to get a cap with several discreet freezes in it, and I
included a chunk with 5 of them in the span of ~3000 frames. I added
packet comments at each frame that the tshark command reported as > 1
sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
details. This is with kernel 5.13.10 provided by Arch (See
https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
for diff vs mainline, nothing NFS/RPC related I can identify).

I tried unsuccessfully to get any failures with the 5.12.15 kernel.

https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing

File should be downloadable anonymously.

- mike

On Thu, Aug 12, 2021 at 7:53 PM Mike Javorski <mike.javorski@gmail.com> wrote:
>
> The "semi-known-good" has been the client. I tried updating the server
> multiple times to a 5.13 kernel and each time had to downgrade to the
> last 5.12 kernel that ArchLinux released (5.12.15) to stabilize
> performance. At each attempt, the client was running the same 5.13
> kernel that was being deployed to the server. I never downgraded the
> client.
>
> Thank you for the scripts and all the details, I will test things out
> this weekend when I can dedicate time to it.
>
> - mike
>
> On Thu, Aug 12, 2021 at 7:39 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Fri, 13 Aug 2021, Mike Javorski wrote:
> > > Neil:
> > >
> > > Apologies for the delay, your message didn't get properly flagged in my email.
> >
> > :-)
> >
> > >
> > > To answer your questions, both client (my Desktop PC) and server (my
> > > NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
> > > w/ current or alternate testing kernels (see below).
> >
> > So the bug could be in the server or the client.  I assume you are
> > careful to test a client against a know-good server, or a server against
> > a known-good client.
> >
> > >                                                                 I
> > > intend to spend some time this weekend attempting to get the tcpdump.
> > > My initial attempts wound up with 400+Mb files which would be
> > > difficult to ship and use for diagnostics.
> >
> > Rather than you sending me the dump, I'll send you the code.
> >
> > Run
> >   tshark -r filename -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.time > 1'
> >
> > This will ensure the NFS traffic is actually decoded as NFS and then
> > report only NFS(rpc) replies that arrive more than 1 second after the
> > request.
> > You can add
> >
> >     -T fields -e frame.number -e rpc.time
> >
> > to find out what the actual delay was.
> >
> > If it reports any, that will be interesting.  Try with a larger time if
> > necessary to get a modest number of hits.  Using editcap and the given
> > frame number you can select out 1000 packets either side of the problem
> > and that should compress to be small enough to transport.
> >
> > However it might not find anything.  If the reply never arrives, you'll
> > never get a reply with a long timeout.  So we need to check that
> > everything got a reply...
> >
> >  tshark -r filename -t tcp.port==2049,rpc  \
> >    -Y 'tcp.port==2049 && rpc.msg == 0' -T fields \
> >    -e rpc.xid -e frame.number | sort > /tmp/requests
> >
> >  tshark -r filename -t tcp.port==2049,rpc  \
> >    -Y 'tcp.port==2049 && rpc.msg == 1' -T fields \
> >    -e rpc.xid -e frame.number | sort > /tmp/replies
> >
> >  join -a1 /tmp/requests /tmp/replies | awk 'NF==2'
> >
> > This should list the xid and frame number of all requests that didn't
> > get a reply.  Again, editcap can extract a range of frames into a file of
> > manageable size.
> >
> > Another possibility is that requests are getting replies, but the reply
> > says "NFS4ERR_DELAY"
> >
> >  tshark -r filename -t tcp.port==2049,rpc -Y nfs.nfsstat4==10008
> >
> > should report any reply with that error code.
> >
> > Hopefully something there will be interesting.
> >
> > NeilBrown

  reply	other threads:[~2021-08-15  1:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-08-08 22:47 ` Chuck Lever III
2021-08-08 23:23   ` Mike Javorski
2021-08-09  0:01 ` NeilBrown
2021-08-09  0:28   ` Mike Javorski
2021-08-10  0:50     ` Mike Javorski
2021-08-10  1:28       ` NeilBrown
2021-08-10 11:54         ` Daire Byrne
2021-08-13  1:51         ` Mike Javorski
2021-08-13  2:39           ` NeilBrown
2021-08-13  2:53             ` Mike Javorski
2021-08-15  1:23               ` Mike Javorski [this message]
2021-08-16  1:20                 ` NeilBrown
2021-08-16 13:21                   ` Chuck Lever III
2021-08-16 16:25                     ` Mike Javorski
2021-08-16 23:01                       ` NeilBrown
2021-08-20  0:31                         ` NeilBrown
2021-08-20  0:52                           ` Mike Javorski
2021-08-22  0:17                             ` Mike Javorski
2021-08-22  3:41                               ` NeilBrown
2021-08-22  4:05                                 ` Mike Javorski
2021-08-22 22:00                                   ` NeilBrown
2021-08-26 19:34                                     ` Mike Javorski
2021-08-26 21:44                                       ` NeilBrown
2021-08-27  0:07                                         ` Mike Javorski
2021-08-27  5:27                                           ` NeilBrown
2021-08-27  6:11                                             ` Mike Javorski
2021-08-27  7:14                                               ` NeilBrown
2021-08-27 14:13                                                 ` Chuck Lever III
2021-08-27 17:07                                                   ` Mike Javorski
2021-08-27 22:00                                                     ` Mike Javorski
2021-08-27 23:49                                                       ` Chuck Lever III
2021-08-28  3:22                                                         ` Mike Javorski
2021-08-28 18:23                                                           ` Chuck Lever III
2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
2021-08-30  9:12                                                               ` Mel Gorman
2021-08-30 20:46                                                               ` J. Bruce Fields
     [not found]                                                             ` <163027609524.7591.4987241695872857175@noble.neil.brown.name>
2021-08-30  9:11                                                               ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() Mel Gorman
2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-09-05  2:02                                                               ` Chuck Lever III
2021-09-16  2:45                                                                 ` Mike Javorski
2021-09-16 18:58                                                                   ` Chuck Lever III
2021-09-16 19:21                                                                     ` Mike Javorski
2021-09-17 14:41                                                                       ` J. Bruce Fields
2021-08-16 16:09                   ` Mike Javorski
2021-08-16 23:04                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOv1SKDDOj5UeUwztrMSNJnLgSoEgD8OU55hqtLHffHvaCQzzA@mail.gmail.com \
    --to=mike.javorski@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).