From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>, Parav Pandit <parav@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>, <linux-rdma@vger.kernel.org>
Subject: Re: pyverbs failures
Date: Fri, 28 Aug 2020 15:44:58 -0300 [thread overview]
Message-ID: <20200828184458.GS1152540@nvidia.com> (raw)
In-Reply-To: <0c4cef74-21bf-19b5-1523-6fffa450e764@gmail.com>
On Fri, Aug 28, 2020 at 11:51:07AM -0500, Bob Pearson wrote:
> I have been trying to reduce the number of test failures in the
> pyverbs tests for the rxe driver. There is one class of these errors
> that seems to be potentially a design failure in rdma core. By
> default each time a new RoCE device is registered the core sets up a
> gid table in cache.c and populates the first gid entry with the
> eui64 version of the IPV6 link local address. Later the other IP
> addresses configured on each port are added as well. It is expected
> that the default entry with sgid_index = 0 will function as a valid
> source address. Five years ago this probably always worked but more
> modern OSes have stopped using this address for privacy
> reasons. Ubuntu 20.04 which is the one I am working on uses a pseudo
> random address and not the MAC based one. Windows and IOS also
> apparently no longer use this address. The result is that the
> pyverbs test cases which use sgid_index = 0 in some cases, and use
> random sgid_indices including 0 in others, fail. The most common
> failure symptom is that when attempting to add a remote address to a
> QP (INIT -> RTR) it is unable to contact the invalid address and it
> times out.
The RoCEv1 GID is formed as you described above, is rxe triggering
some RoCEv1 support that it can't handle?
> A better choice for the default GID for RoCEv2 devices may be to
> just use the IPV6 address configured as the link local address for
> the ndev. If they use the eui64 address the result will be the
> same. At least some of these OSes claim that the link local address
> is temporary, changing periodically. This would require tracking
> IPV6.
Certainly RoCEv2 devices shouldn't have GIDs that are not matching
their IP addresses. Otherwise it would malform a UDP header.
Maybe Parav remebers if there is some tricky reason why this is still
being done?
Jason
next prev parent reply other threads:[~2020-08-28 18:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-28 16:51 pyverbs failures Bob Pearson
2020-08-28 18:44 ` Jason Gunthorpe [this message]
2020-08-28 20:48 ` Bob Pearson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200828184458.GS1152540@nvidia.com \
--to=jgg@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=rpearsonhpe@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox