From: Kevan Rehm <kevanrehm@gmail.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mark Zhang <markzhang@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Yishai Hadas <yishaih@nvidia.com>,
kevan.rehm@hpe.com, chien.tin.tung@intel.com
Subject: Re: Segfault in mlx5 driver on infiniband after application fork
Date: Fri, 16 Feb 2024 14:56:56 -0500 [thread overview]
Message-ID: <54DF121B-B413-4977-9F57-FCEB92FF4BB9@gmail.com> (raw)
In-Reply-To: <20240212164533.GG765010@ziepe.ca>
>
> Newer kernels are detected and disable the DONT_FORK calls in verbs.
>
> rdma-core support is present since:
>
> commit 67b00c3835a3480a035a9e1bcf5695f5c0e8568e
> Author: Gal Pressman <galpress@amazon.com>
> Date: Sun Apr 4 17:24:54 2021 +0300
>
> verbs: Report when ibv_fork_init() is not needed
>
> Identify kernels which do not require ibv_fork_init() to be called and
> report it through the ibv_is_fork_initialized() verb.
>
> The feature detection is done through a new read-only attribute in the
> get sys netlink command. If the attribute is not reported, assume old
> kernel without COF support. If the attribute is reported, use the
> returned value.
>
> This allows ibv_is_fork_initialized() to return the previously unused
> IBV_FORK_UNNEEDED value, which takes precedence over the
> DISABLED/ENABLED values. Meaning that if the kernel does not require a
> call to ibv_fork_init(), IBV_FORK_UNNEEDED will be returned regardless
> of whether ibv_fork_init() was called or not.
>
> Signed-off-by: Gal Pressman <galpress@amazon.com>
>
> The kernel support was in v5.13-rc1~78^2~1
>
> And backported in a few cases.
To work around this, I had to use gdb on my benchmark to set a breakpoint in ibv_fork_init() in order to track down all the callers of that function, which turned out to be both UCX and Libfabric. I then had to download source repos, examine the code, and for each repo determine what environment variable controls the calls to ibv_fork_init(). For Libfabric I had to ensure that RDMA_FORK_SAFE and IBV_FORK_SAFE were not set, which my team members routinely use. For UCX I had to set UCX_IB_FORK_INIT=no, otherwise by default UCX always calls ibv_fork_init. With UCX_IB_FORK_INIT set to no, scary error messages about registered memory corruption print to stderr whenever there is a fork, even though that’s not true any more with up-to-date kernels. Folks that don’t know the details of ibv_fork_init() behavior are going to be reluctant to set UCX_IB_FORK_INIT=no.
If ibv_fork_init() would check the kernel and just return without initializing mm_root when the kernel has enhanced fork support, then all the environment variable hassles go away, the environment variable settings don’t matter, ibv_fork_init() will always do the right thing. This seems like a big win to me, am I missing some downside perhaps?
Thanks, Kevan
next prev parent reply other threads:[~2024-02-16 19:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-11 19:24 Segfault in mlx5 driver on infiniband after application fork Kevan Rehm
2024-02-12 13:33 ` Jason Gunthorpe
2024-02-12 14:37 ` Kevan Rehm
2024-02-12 14:40 ` Jason Gunthorpe
2024-02-12 16:04 ` Kevan Rehm
2024-02-12 16:12 ` Jason Gunthorpe
2024-02-12 16:37 ` Kevan Rehm
2024-02-12 16:45 ` Jason Gunthorpe
2024-02-16 19:56 ` Kevan Rehm [this message]
-- strict thread matches above, loose matches on Subject: below --
2024-02-21 12:51 Kevan Rehm
2024-02-13 16:45 Kevan Rehm
2024-02-07 19:17 Rehm, Kevan
2024-02-08 8:52 ` Leon Romanovsky
2024-02-08 9:05 ` Mark Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DF121B-B413-4977-9F57-FCEB92FF4BB9@gmail.com \
--to=kevanrehm@gmail.com \
--cc=chien.tin.tung@intel.com \
--cc=jgg@ziepe.ca \
--cc=kevan.rehm@hpe.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=markzhang@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox