From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: "memory management error" with NFS/RDMA on RoCE
Date: Wed, 5 Jul 2017 18:29:27 +0300 [thread overview]
Message-ID: <20170705152927.GM1528@mtr-leonro.local> (raw)
In-Reply-To: <06510488-DB16-4781-8E5A-FDFFDDD00B4F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]
On Wed, Jul 05, 2017 at 10:40:41AM -0400, Chuck Lever wrote:
>
> > On Jun 27, 2017, at 1:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >
> > On Tue, Jun 27, 2017 at 10:56:29AM -0400, Chuck Lever wrote:
> >> Hi Sagi-
> >>
> >>> On Jun 27, 2017, at 5:28 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> >>>
> >>>
> >>>> While running xfstests on an NFS/RDMA mount, I see this in
> >>>> the client's /var/log/messages multiple times:
> >>>> Jun 22 14:13:45 manet kernel: mlx5_0:dump_cqe:275:(pid 0): dump error cqe
> >>>> Jun 22 14:13:45 manet kernel: 00000000 00000000 00000000 00000000
> >>>> Jun 22 14:13:45 manet kernel: 00000000 00000000 00000000 00000000
> >>>> Jun 22 14:13:45 manet kernel: 00000000 00000000 00000000 00000000
> >>>> Jun 22 14:13:45 manet kernel: 00000000 08007806 250000cd 024027d3
> >>>> Jun 22 14:13:45 manet kernel: rpcrdma: fastreg: memory management operation error (6/0x78)
> >>>> As far as I can tell the client is able to recover and continue
> >>>> the test. However, this error is not supposed to happen in normal
> >>>> operation.
> >>>> This is with a Mellanox CX4 in RoCEv1 mode, v4.12-rc2.
> >>>
> >>> Is this a regression?
> >>
> >> I can't answer that question with authority, because I just
> >> started trying out NFS/RDMA on RoCE with mlx5. But Robert has
> >> reported very similar symptoms with iSER on v4.9. It appears
> >> to have been around for a while, if these are the same.
> >>
> >>
> >>> What kernel version are you running?
> >>
> >> v4.12-rc2.
> >>
> >>
> >>> FW revision?
> >>
> >> 12.18.2000
> >>
> >>
> >>> Is the below commit applied?
> >>
> >> This commit does not appear to be applied to my kernel.
> >>
> >>
> >>> commit 6e8484c5cf07c7ee632587e98c1a12d319dacb7c
> >>> Author: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> Date: Sun May 28 10:53:11 2017 +0300
> >>>
> >>> RDMA/mlx5: set UMR wqe fence according to HCA cap
> >>>
> >>> Cache the needed umr_fence and set the wqe ctrl segmennt
> >>> accordingly.
> >>>
> >>> Signed-off-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>> Acked-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >>> Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> >>> Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >>>
> >>> This is the only thing that changed in that area
> >>> lately...
> >>>
> >>> Can you try without it?
> >>
> >> I haven't tried with it. I can pull it and see if it helps.
> >>
> >> I have tried:
> >>
> >> - with and without IOMMU enabled
> >> - with RoCE v1 and v2
> >> - with instrumentation:
> >>
> >> This can happen to any MR at any time after any number of
> >> uses. It does not appear to be "sticky" (ie, xprtrdma
> >> recovery from a memory management error clears the problem
> >> successfully by releasing the MR and allocating a new one).
> >>
> >> So it feels like a f/w or driver problem to me, at this
> >> point.
> >
> > Jack and me discussed your issue tomorrow morning and we have strong
> > feeling that it is FW.
>
> Hi Leon-
>
> Who is going to drive this issue to resolution? Do you need me
> to do something?
I don't think so, Jack was supposed to do it.
>
>
> > Thanks
> >
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-07-05 15:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-22 18:28 "memory management error" with NFS/RDMA on RoCE Chuck Lever
[not found] ` <7F0FCF80-DB7B-46F1-BB9A-0B070603DE61-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-06-22 20:57 ` Robert LeBlanc
2017-06-27 9:28 ` Sagi Grimberg
[not found] ` <797a43c4-f30d-9deb-a332-c62cbd01be7b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-27 14:56 ` Chuck Lever
[not found] ` <2FEEE227-9BCF-4454-A056-3997C1E54686-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-06-27 16:08 ` Sagi Grimberg
[not found] ` <a82056d7-5685-5b85-8226-c54065e729fe-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-27 17:03 ` Robert LeBlanc
2017-06-27 17:36 ` Leon Romanovsky
[not found] ` <20170627173620.GT1248-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-06-27 19:30 ` Chuck Lever
2017-07-05 14:40 ` Chuck Lever
[not found] ` <06510488-DB16-4781-8E5A-FDFFDDD00B4F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-07-05 15:29 ` Leon Romanovsky [this message]
2017-08-08 15:45 ` Max Gurtovoy
[not found] ` <7ef3ca44-1253-7aae-1b46-f78cc15e627d-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-08-08 16:14 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170705152927.GM1528@mtr-leonro.local \
--to=leon-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox