public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Timo Rothenpieler <timo@rothenpieler.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>
Subject: Re: NFS over RDMA issues on Linux 5.4
Date: Tue, 4 Aug 2020 18:55:54 +0300	[thread overview]
Message-ID: <20200804155554.GD4432@unreal> (raw)
In-Reply-To: <B82C41F6-1C23-44F5-B802-621F6B63E12F@oracle.com>

On Tue, Aug 04, 2020 at 11:34:05AM -0400, Chuck Lever wrote:
>
>
> > On Aug 4, 2020, at 9:53 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> >
> >
> >> On Aug 4, 2020, at 9:46 AM, Leon Romanovsky <leon@kernel.org> wrote:
> >>
> >> On Tue, Aug 04, 2020 at 09:12:55AM -0400, Chuck Lever wrote:
> >>>
> >>>
> >>>> On Aug 4, 2020, at 9:08 AM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> >>>>
> >>>> On 04.08.2020 14:49, Chuck Lever wrote:
> >>>>> Timo, I tend to think this is not a configuration issue.
> >>>>> Do you know of a known working kernel?
> >>>>
> >>>> This is a brand new system, it's never been running with any kernel older than 5.4, and downgrading it to 4.19 or something else while in operation is unfortunately not easily possible. For a client it would definitely not be out of the question, but the main nfs server I cannot easily downgrade.
> >>>>
> >>>> Also keep in mind that the dmesg spam happens on both server and client simultaneously.
> >>>
> >>> Let's start with the client only, since restarting it seems to clear the problem.
> >>
> >> It is client because according to the server CQE errors, it is Remote_Invalid_Request_Error
> >> with "9.7.5.2.2 NAK CODES" from IBTA.
> >
> > Thanks! OK, then let's use ftrace.
> >
> > Timo, can you install trace-cmd on your client? Then:
> >
> > 1. # trace-cmd record -e rpcrdma -e sunrpc
> >
> > 2. Trigger the problem
> >
> > 3. Control-C the trace-cmd, and copy the trace.dat file to another system
> >
> > 4. reboot your client
> >
> > Then send me your trace.dat. You don't have to cc the mailing lists.
>
> I see a LOC_LEN_ERR on a Receive. Leon, doesn't that mean the server's
> Send was too large?

1.
We have local_length_error counter, it can help to run it on server and clients.
[leonro@vm ~]$ cat /sys/class/infiniband/ibp0s9/ports/1/hw_counters/resp_local_length_error
0

resp_local_length_error - "Number of times responder detected local length errors."
2.
LOC_LEN_ERR supports that is written in CQE error on the client.
This is what is written in our HW document:
 IB compliant completion with error syndrome
	0x1: Local_Length_Error
3.
From IBTA, 11.6.2 COMPLETION RETURN STATUS
Local Length Error - Generated for a Work Request posted to the local
Send Queue when the sum of the Data Segment lengths exceeds the message
length for the channel adapter port. Generated for a Work Request posted
to the local Receive Queue when the sum of the Data Segment lengths is
too small to receive a valid incoming message or the length of the incoming
message is greater than the maximum message size supported by the HCA port
that received the message.


So if "1" works :), we will be able to distinguish if client sends too
large WR or recieves too large.

Thanks

>
> Timo, what filesystem are you sharing on your NFS server? The thing that
> comes to mind is https://bugzilla.kernel.org/show_bug.cgi?id=198053
>
>
> --
> Chuck Lever
>
>
>

      parent reply	other threads:[~2020-08-04 15:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8a1087d3-9add-dfe1-da0c-edab74fcca51@rothenpieler.org>
2020-08-03 16:24 ` NFS over RDMA issues on Linux 5.4 Chuck Lever
2020-08-04  9:36   ` Leon Romanovsky
2020-08-04 10:52     ` Timo Rothenpieler
2020-08-04 12:25       ` Leon Romanovsky
2020-08-04 12:49         ` Chuck Lever
2020-08-04 13:08           ` Timo Rothenpieler
2020-08-04 13:12             ` Chuck Lever
2020-08-04 13:19               ` Timo Rothenpieler
2020-08-04 13:24                 ` Chuck Lever
2020-08-04 13:40                   ` Timo Rothenpieler
2020-08-04 13:46               ` Leon Romanovsky
2020-08-04 13:53                 ` Chuck Lever
2020-08-04 15:34                   ` Chuck Lever
2020-08-04 15:39                     ` Timo Rothenpieler
2020-08-04 15:46                       ` Chuck Lever
2020-08-04 15:50                         ` Timo Rothenpieler
2020-08-04 16:07                           ` Timo Rothenpieler
2020-08-04 15:55                     ` Leon Romanovsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804155554.GD4432@unreal \
    --to=leon@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=timo@rothenpieler.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox