All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Timo Rothenpieler <timo@rothenpieler.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>
Subject: Re: NFS over RDMA issues on Linux 5.4
Date: Tue, 4 Aug 2020 18:55:54 +0300	[thread overview]
Message-ID: <20200804155554.GD4432@unreal> (raw)
In-Reply-To: <B82C41F6-1C23-44F5-B802-621F6B63E12F@oracle.com>

On Tue, Aug 04, 2020 at 11:34:05AM -0400, Chuck Lever wrote:
>
>
> > On Aug 4, 2020, at 9:53 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> >
> >
> >> On Aug 4, 2020, at 9:46 AM, Leon Romanovsky <leon@kernel.org> wrote:
> >>
> >> On Tue, Aug 04, 2020 at 09:12:55AM -0400, Chuck Lever wrote:
> >>>
> >>>
> >>>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> >>>>
> >>>> On 04.08.2020 14:49, Chuck Lever wrote:
> >>>>> Timo, I tend to think this is not a configuration issue.
> >>>>> Do you know of a known working kernel?
> >>>>
> >>>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
> >>>>
> >>>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
> >>>
> >>> Let's start with the client only, since restarting it seems to clear the problem.
> >>
> >> It is client because according to the server CQE errors, it is Remote_Invalid_Request_Error
> >> with "9.7.5.2.2 NAK CODES" from IBTA.
> >
> > Thanks! OK, then let's use ftrace.
> >
> > Timo, can you install trace-cmd on your client? Then:
> >
> > 1. # trace-cmd record -e rpcrdma -e sunrpc
> >
> > 2. Trigger the problem
> >
> > 3. Control-C the trace-cmd, and copy the trace.dat file to another system
> >
> > 4. reboot your client
> >
> > Then send me your trace.dat. You don't have to cc the mailing lists.
>
> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
> Send was too large?

1.
We have local_length_error counter, it can help to run it on server and clients.
[leonro@vm ~]$ cat /sys/class/infiniband/ibp0s9/ports/1/hw_counters/resp_local_length_error
0

resp_local_length_error - "Number of times responder detected local length errors."
2.
LOC_LEN_ERR supports that is written in CQE error on the client.
This is what is written in our HW document:
 IB compliant completion with error syndrome
	0x1: Local_Length_Error
3.
From IBTA, 11.6.2 COMPLETION RETURN STATUS
Local Length Error - Generated for a Work Request posted to the local
Send Queue when the sum of the Data Segment lengths exceeds the message
length for the channel adapter port. Generated for a Work Request posted
to the local Receive Queue when the sum of the Data Segment lengths is
too small to receive a valid incoming message or the length of the incoming
message is greater than the maximum message size supported by the HCA port
that received the message.


So if "1" works :), we will be able to distinguish if client sends too
large WR or recieves too large.

Thanks

>
> Timo, what filesystem are you sharing on your NFS server? The thing that
> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>
>
> --
> Chuck Lever
>
>
>

      parent reply	other threads:[~2020-08-04 15:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 15:05 NFS over RDMA issues on Linux 5.4 Timo Rothenpieler
2020-08-03 16:24 ` Chuck Lever
2020-08-04  9:36   ` Leon Romanovsky
2020-08-04 10:52     ` Timo Rothenpieler
2020-08-04 12:25       ` Leon Romanovsky
2020-08-04 12:49         ` Chuck Lever
2020-08-04 13:08           ` Timo Rothenpieler
2020-08-04 13:12             ` Chuck Lever
2020-08-04 13:19               ` Timo Rothenpieler
2020-08-04 13:24                 ` Chuck Lever
2020-08-04 13:40                   ` Timo Rothenpieler
2020-08-04 13:46               ` Leon Romanovsky
2020-08-04 13:53                 ` Chuck Lever
2020-08-04 15:34                   ` Chuck Lever
2020-08-04 15:39                     ` Timo Rothenpieler
2020-08-04 15:46                       ` Chuck Lever
2020-08-04 15:50                         ` Timo Rothenpieler
2020-08-04 16:07                           ` Timo Rothenpieler
2020-08-04 15:55                     ` Leon Romanovsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804155554.GD4432@unreal \
    --to=leon@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=timo@rothenpieler.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.