Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Race condition between / wrong load order of ib_umad and ib_ipoib
@ 2020-06-02 15:11 Benjamin Drung
  2020-06-02 19:50 ` Jason Gunthorpe
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin Drung @ 2020-06-02 15:11 UTC (permalink / raw)
  To: linux-rdma

Hi,

after a kernel upgrade to version 4.19 (in-house built with Mellanox
OFED drivers), some of our systems fail to bring up their IPoIB devices
on boot. Different HCAs are affected (e.g. MT4099 and MT26428). We are
using rdma-core on Debian and have IPoIB devices (like `ib0.dddd`)
configured in `/etc/network/interfaces`. Big cluster seem to be more
affected than smaller ones. In case of the failure, we see this kernel
message:

```
ib0.dddd: P_Key 0xdddd is not found
```

Pinging other hosts will fail then with:

```
ping: sendmsg: Network is unreachable
```

Upgrading to rdma-core 29.0 did not change anything. Excluding all
InfiniBand kernel modules from the initrd reduced the likelihood to run
into this issue, but did not fix it.

We found one report on the Internet describing a similar issue, which
claims that the solution is to change/fix them module load order: 
https://community.brightcomputing.com/question/5d6614ba08e8e81e885f18ef

We use the default `/etc/rdma/modules/infiniband.conf` shipped in the
Debian package:

```
# These modules are loaded by the system if any InfiniBand device is installed
# InfiniBand over IP netdevice
ib_ipoib

# Access to fabric management SMPs and GMPs from userspace.
ib_umad

# SCSI Remote Protocol target support
# ib_srpt

# ib_ucm provides the obsolete /dev/infiniband/ucm0
# ib_ucm
```

Due to this configuration, `ib_ipoib` is loaded before `ib_umad`. After
changing the order in this configuration file to load `ib_umad` before
`ib_ipoib`, the servers come up correctly.

-- 
Benjamin Drung

DevOps Engineer and Debian & Ubuntu Developer
Platform Integration (IONOS Cloud)

1&1 IONOS SE | Greifswalder Str. 207 | 10405 Berlin | Germany
E-mail: benjamin.drung@cloud.ionos.com | Web: www.ionos.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 24498

Vorstand: Dr. Christian Böing, Hüseyin Dogan, Dr. Martin Endreß, Hans-
Henning Kettler, Arthur Mai, Matthias Steinberg, Achim Weiß
Aufsichtsratsvorsitzender: Markus Kadelke


Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie
bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem
bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
verwenden.

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this e-mail 
in any way is prohibited. If you have received this e-mail in error,
please notify the sender and delete the e-mail.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-03 11:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-02 15:11 Race condition between / wrong load order of ib_umad and ib_ipoib Benjamin Drung
2020-06-02 19:50 ` Jason Gunthorpe
     [not found]   ` <CAD+HZHX+RXs-Hxr-pV2Ufy-dJi22eJtH6MkNc1ZUmYXS9Pu91g@mail.gmail.com>
2020-06-03  7:37     ` Jinpu Wang
2020-06-03 11:24       ` Jason Gunthorpe
2020-06-03 11:31         ` Jinpu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox