Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Kevan Rehm <kevanrehm@gmail.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mark Zhang <markzhang@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Yishai Hadas <yishaih@nvidia.com>,
	kevan.rehm@hpe.com, chien.tin.tung@intel.com
Subject: Re: Segfault in mlx5 driver on infiniband after application fork
Date: Fri, 16 Feb 2024 14:56:56 -0500	[thread overview]
Message-ID: <54DF121B-B413-4977-9F57-FCEB92FF4BB9@gmail.com> (raw)
In-Reply-To: <20240212164533.GG765010@ziepe.ca>

> 
> Newer kernels are detected and disable the DONT_FORK calls in verbs.
> 
> rdma-core support is present since:
> 
> commit 67b00c3835a3480a035a9e1bcf5695f5c0e8568e
> Author: Gal Pressman <galpress@amazon.com>
> Date:   Sun Apr 4 17:24:54 2021 +0300
> 
>    verbs: Report when ibv_fork_init() is not needed
> 
>    Identify kernels which do not require ibv_fork_init() to be called and
>    report it through the ibv_is_fork_initialized() verb.
> 
>    The feature detection is done through a new read-only attribute in the
>    get sys netlink command. If the attribute is not reported, assume old
>    kernel without COF support. If the attribute is reported, use the
>    returned value.
> 
>    This allows ibv_is_fork_initialized() to return the previously unused
>    IBV_FORK_UNNEEDED value, which takes precedence over the
>    DISABLED/ENABLED values. Meaning that if the kernel does not require a
>    call to ibv_fork_init(), IBV_FORK_UNNEEDED will be returned regardless
>    of whether ibv_fork_init() was called or not.
> 
>    Signed-off-by: Gal Pressman <galpress@amazon.com>
> 
> The kernel support was in v5.13-rc1~78^2~1
> 
> And backported in a few cases.

To work around this, I had to use gdb on my benchmark to set a breakpoint in ibv_fork_init() in order to track down all the callers of that function, which turned out to be both UCX and Libfabric.  I then had to download source repos, examine the code, and for each repo determine what environment variable controls the calls to ibv_fork_init().  For Libfabric I had to ensure that RDMA_FORK_SAFE and IBV_FORK_SAFE were not set, which my team members routinely use.  For UCX I had to set UCX_IB_FORK_INIT=no, otherwise by default UCX always calls ibv_fork_init.   With UCX_IB_FORK_INIT set to no, scary error messages about registered memory corruption print to stderr whenever there is a fork, even though that’s not true any more with up-to-date kernels.   Folks that don’t know the details of ibv_fork_init() behavior are going to be reluctant to set UCX_IB_FORK_INIT=no.

If ibv_fork_init() would check the kernel and just return without initializing mm_root when the kernel has enhanced fork support, then all the environment variable hassles go away, the environment variable settings don’t matter, ibv_fork_init() will always do the right thing.  This seems like a big win to me, am I missing some downside perhaps?

Thanks, Kevan





  reply	other threads:[~2024-02-16 19:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-11 19:24 Segfault in mlx5 driver on infiniband after application fork Kevan Rehm
2024-02-12 13:33 ` Jason Gunthorpe
2024-02-12 14:37   ` Kevan Rehm
2024-02-12 14:40     ` Jason Gunthorpe
2024-02-12 16:04       ` Kevan Rehm
2024-02-12 16:12         ` Jason Gunthorpe
2024-02-12 16:37           ` Kevan Rehm
2024-02-12 16:45             ` Jason Gunthorpe
2024-02-16 19:56               ` Kevan Rehm [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-02-21 12:51 Kevan Rehm
2024-02-13 16:45 Kevan Rehm
2024-02-07 19:17 Rehm, Kevan
2024-02-08  8:52 ` Leon Romanovsky
2024-02-08  9:05   ` Mark Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DF121B-B413-4977-9F57-FCEB92FF4BB9@gmail.com \
    --to=kevanrehm@gmail.com \
    --cc=chien.tin.tung@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=kevan.rehm@hpe.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=markzhang@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox