All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>, Parav Pandit <parav@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>, <linux-rdma@vger.kernel.org>
Subject: Re: pyverbs failures
Date: Fri, 28 Aug 2020 15:44:58 -0300	[thread overview]
Message-ID: <20200828184458.GS1152540@nvidia.com> (raw)
In-Reply-To: <0c4cef74-21bf-19b5-1523-6fffa450e764@gmail.com>

On Fri, Aug 28, 2020 at 11:51:07AM -0500, Bob Pearson wrote:

> I have been trying to reduce the number of test failures in the
> pyverbs tests for the rxe driver. There is one class of these errors
> that seems to be potentially a design failure in rdma core. By
> default each time a new RoCE device is registered the core sets up a
> gid table in cache.c and populates the first gid entry with the
> eui64 version of the IPV6 link local address. Later the other IP
> addresses configured on each port are added as well. It is expected
> that the default entry with sgid_index = 0 will function as a valid
> source address. Five years ago this probably always worked but more
> modern OSes have stopped using this address for privacy
> reasons. Ubuntu 20.04 which is the one I am working on uses a pseudo
> random address and not the MAC based one. Windows and IOS also
> apparently no longer use this address. The result is that the
> pyverbs test cases which use sgid_index = 0 in some cases, and use
> random sgid_indices including 0 in others, fail. The most common
> failure symptom is that when attempting to add a remote address to a
> QP (INIT -> RTR) it is unable to contact the invalid address and it
> times out.

The RoCEv1 GID is formed as you described above, is rxe triggering
some RoCEv1 support that it can't handle?

> A better choice for the default GID for RoCEv2 devices may be to
> just use the IPV6 address configured as the link local address for
> the ndev. If they use the eui64 address the result will be the
> same. At least some of these OSes claim that the link local address
> is temporary, changing periodically. This would require tracking
> IPV6.

Certainly RoCEv2 devices shouldn't have GIDs that are not matching
their IP addresses. Otherwise it would malform a UDP header.

Maybe Parav remebers if there is some tricky reason why this is still
being done?

Jason

  reply	other threads:[~2020-08-28 18:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 16:51 pyverbs failures Bob Pearson
2020-08-28 18:44 ` Jason Gunthorpe [this message]
2020-08-28 20:48   ` Bob Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200828184458.GS1152540@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=rpearsonhpe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.