From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Olga Kornievskaia'
<aglo-63aXycvo3TyHXe+LvDLADg@public.gmane.org>,
'linux-rdma' <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: RE: krping problem on 4.15-rc4
Date: Wed, 10 Jan 2018 14:10:47 -0600 [thread overview]
Message-ID: <00ff01d38a4f$1a979eb0$4fc6dc10$@opengridcomputing.com> (raw)
In-Reply-To: <CAN-5tyH1HO7yzzQLyb5z5Pq=OrHnKzmCrR2MffLguqsEA-mwWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> Hi folks,
>
> I have 2 linux machines with CX-5 cards (Mellanox MCX515A-CCAT (one
> port)) and krping doesn't work in one direction but works in another.
> rping works in both direction. ib_send_bw works in both directions and
> display 39Gb one way and 36Gb other way on a 40Gb setup.
>
> krping is upstream commit 4df520c888d80e5370d0f58b2eeac8355e3f2286.
>
> Server is started with: [kolga@localhost krping]$ sudo echo
> "server,port=9999,addr=172.20.35.191,count=10,verbose" > /proc/krping
> And it displays in /var/log/messages:
> Jan 4 14:23:29 localhost kernel: mlx5_0:dump_cqe:277:(pid 0): dump error cqe
> Jan 4 14:23:29 localhost kernel: 00000000 00000000 00000000 00000000
> Jan 4 14:23:29 localhost kernel: 00000000 00000000 00000000 00000000
> Jan 4 14:23:29 localhost kernel: 00000000 00000000 00000000 00000000
> Jan 4 14:23:29 localhost kernel: 00000000 93003204 10000122 0005bfd2
> Jan 4 14:23:29 localhost kernel: krping: cq completion failed with
> wr_id 0 status 4 opcode 128 vender_err 32
> Jan 4 14:23:29 localhost kernel: krping: cq completion in ERROR state
> Jan 4 14:23:29 localhost kernel: krping: wait for RDMA_READ_COMPLETE state
> 10
>
> Client is run with: [kolga@sti-rx200-231-d1 ~]$ sudo echo
> "client,addr=172.20.35.191,port=9999,verbose,count=10" > /proc/krping
> And in var log messages:
> Jan 4 14:19:27 localhost kernel: krping: DISCONNECT EVENT...
> Jan 4 14:19:27 localhost kernel: krping: wait for RDMA_WRITE_ADV state 10
> Jan 4 14:19:28 localhost kernel: krping: cq completion in ERROR state
>
> On the network trace is see (over RRoCE):
> CM: ConnectRequest
> CM: ConnectReply
> CM: ReadyToUse
> RC Send Only QP
> RC Ack
> RC RDMA Read Request
> RC RDMA Read Response Only
> CM: DisconnectRequest
> CM: DisconnectReply
>
> I have previously submitted it to Mellanox but they told me to
> resubmit to linux-rdma list: They also said the engineering did look
> at the CQE error and the meaning of it was:
> PD (protection domain) violation - error in fetch data in rxs in pd
> (send opcodes/ read respond / atomic ack).
Hey Olga,
Are the machines the same kernel version / distro sw / and hw - cpu/motherboard/memory/etc? If not, what is different about them? Is it the krping server that sees the CQ error? Do other rdma devices work on these systems?
Thanks,
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-01-10 20:10 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-09 15:30 krping problem on 4.15-rc4 Olga Kornievskaia
[not found] ` <CAN-5tyH1HO7yzzQLyb5z5Pq=OrHnKzmCrR2MffLguqsEA-mwWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-10 20:10 ` Steve Wise [this message]
2018-01-11 18:18 ` Olga Kornievskaia
2018-01-11 19:45 ` Steve Wise
2018-01-12 22:06 ` Olga Kornievskaia
[not found] ` <CAN-5tyGq=hmXY9HZYXpfaytOUV=gb0fri69gj69WKbbYtW3nTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-13 0:07 ` Steve Wise
2018-01-16 19:50 ` Olga Kornievskaia
[not found] ` <CAN-5tyG9ZsaKZs3ayfFfuy7o25DrXR2yWmwUvLdNutJ1SbEg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-16 21:14 ` Olga Kornievskaia
[not found] ` <CAN-5tyFSYWaTPVdq=99Yr9XwnULyf4tw06roZys=rtR0F3x03g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-17 21:03 ` Doug Ledford
[not found] ` <1516223013.3403.285.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-17 22:03 ` Olga Kornievskaia
[not found] ` <CAN-5tyFM_Noj5n-BW+BMa-0VXBWnUVWU2JkiP2f5JBpZoA6YcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-18 16:13 ` Olga Kornievskaia
[not found] ` <CAN-5tyGxnd0WnvgxEpNpZ5fG6u2JZs=Wg0fEvt8EaNLHckvx0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-19 11:08 ` Leon Romanovsky
[not found] ` <20180119110852.GB1393-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-19 12:21 ` Majd Dibbiny
[not found] ` <14B966CB-B883-4431-A2A3-9DDE6B88B9AB-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-19 13:57 ` Olga Kornievskaia
[not found] ` <CAN-5tyGiuuvzxru+aeeCahukrbm_aivN+HfLx=X1d8txxL4A9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-19 21:07 ` Olga Kornievskaia
2018-01-19 15:53 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='00ff01d38a4f$1a979eb0$4fc6dc10$@opengridcomputing.com' \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=aglo-63aXycvo3TyHXe+LvDLADg@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.