Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Christopher Lameter <cl@linux.com>
To: Jens Domke <jens.domke@riken.jp>
Cc: linux-rdma@vger.kernel.org
Subject: Re: Is there a working cache for path record and lids etc for librdmacm?
Date: Tue, 17 Nov 2020 14:20:22 +0000 (UTC)	[thread overview]
Message-ID: <alpine.DEB.2.22.394.2011171418050.206345@www.lameter.com> (raw)
In-Reply-To: <bbaa9827-fed4-492b-5c22-e543e8c69fbf@riken.jp>

On Tue, 17 Nov 2020, Jens Domke wrote:

> I have used ibacm successfully years ago (think somewhere in the
> 2013-2015 timeframe) but abandoned the approach because some
> measurements indicated that using OpenMPI with rdmacm had a big
> runtime overhead compared to using OpenMPI+oob (Mellanox was
> informed but I'm unsure how much has changed until now)

Mellanox does not support ibacm.... But ok. Thanks. Good to know someone
that has actually used it.

> > Is there something that can locally cache the results of the SM queries to
> > avoid additional requests?
>
> Not that I know of, but others might know better. Maybe try contacting
> Sean Hefty (driver behind ibacm) directly if he missed your email here
> on the list.


I have talked to Ira Weiny who wax the last one who did major changes to
the source but he does not know of any alternate solution.

> > We have tried IBACM but the address resolution does not work on it. It is
> > unable to complete a request for any address resolution and leaves kernel
> > threads that never terminate instead.
>
> Setting up ibacm was/is painful, maybe you could verify that it works on
> a test bed with lowlevel rdmacm tools to debug with ping-pong, etc.

That was done and the bug was confirmed. There is bitrot there in the MAD
communication layer.

> Furthermore, another thing I learned the hard way was that a cold cache
> can overwhelm opensm as well. So, if you deploy ibacm, you have to make
> sure that not too many requests go to the local ibacm on too many nodes
> simultaneously right after starting ibacm service, otherwise having all
> nodes sending numerous requests to opensm could timeout -> could be the
> reason for your stalled kernel threads.

Right But our cluster only has around 200 nodes max. Should be fine.

> (another explanation is obviously a bug in ibacm and/or incompatibility
> to newer versions of librdmacm or opensm or other IB libs)
>
> Sorry, that I cannot provide more specific and direct help, but maybe my
> pointers can help you solve the issue.

Thanks.


  reply	other threads:[~2020-11-17 14:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17  2:57 Is there a working cache for path record and lids etc for librdmacm? Christopher Lameter
2020-11-17  8:46 ` Jens Domke
2020-11-17 14:20   ` Christopher Lameter [this message]
2020-11-17 19:33 ` Jason Gunthorpe
2020-11-20 18:05   ` Christopher Lameter
2020-11-20 18:34     ` Håkon Bugge
2020-11-22 12:49       ` Christopher Lameter
2020-11-22 15:50         ` Håkon Bugge
2020-11-22 19:22           ` Christopher Lameter
2020-11-23 12:50             ` Christopher Lameter
2020-11-23 19:01               ` Håkon Bugge
2020-11-24 19:01                 ` Christopher Lameter
2020-11-25  8:10                   ` Honggang LI
2020-11-25 16:43                     ` Christopher Lameter
2020-11-27 14:52                       ` Håkon Bugge
2020-11-30  8:24                         ` Christopher Lameter
2020-12-04 11:17                           ` Håkon Bugge
2020-12-05 11:50                             ` Christoph Lameter
2020-12-07 10:28                             ` Christoph Lameter
2020-12-07 21:08                               ` Mark Haywood
2020-12-08  8:59                                 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.22.394.2011171418050.206345@www.lameter.com \
    --to=cl@linux.com \
    --cc=jens.domke@riken.jp \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox