From: Jason Gunthorpe <jgg@ziepe.ca>
To: Kevan Rehm <kevanrehm@gmail.com>
Cc: Mark Zhang <markzhang@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Yishai Hadas <yishaih@nvidia.com>,
kevan.rehm@hpe.com
Subject: Re: Segfault in mlx5 driver on infiniband after application fork
Date: Mon, 12 Feb 2024 09:33:03 -0400 [thread overview]
Message-ID: <20240212133303.GA765010@ziepe.ca> (raw)
In-Reply-To: <3CAF66C4-32E1-4258-9656-D886843D7771@gmail.com>
On Sun, Feb 11, 2024 at 02:24:16PM -0500, Kevan Rehm wrote:
>
> >> An application started by pytorch does a fork, then the child
> >> process attempts to use libfabric to open a new DAOS infiniband
> >> endpoint. The original endpoint is owned and still in use by the
> >> parent process.
> >>
> >> When the parent process created the endpoint (fi_fabric,
> >> fi_domain, fi_endpoint calls), the mlx5 driver allocated memory
> >> pages for use in SRQ creation, and issued a madvise to say that
> >> the pages are DONTFORK. These pages are associated with the
> >> domain’sibv_device which is cached in the driver. After the fork
> >> when the child process calls fi_domain for its new endpoint, it
> >> gets the ibv_device that was cached at the time it was created by
> >> the parent. The child process immediately segfaults when trying
> >> to create a SRQ, because the pages associated with that
> >> ibv_device are not in the child’s memory. There doesn’t appear
> >> to be any way for a child process to create a fresh endpoint
> >> because of the caching being done for ibv_devices.
>
> > For anyone who is interested in this issue, please follow the links below:
> > https://github.com/ofiwg/libfabric/issues/9792
> > https://daosio.atlassian.net/browse/DAOS-15117
> >
> > Regarding the issue, I don't know if mlx5 actively used to run
> > libfabric, but the mentioned call to ibv_dontfork_range() existed from
> > prehistoric era.
>
> Yes, libfabric has used mlx5 for a long time.
>
> > Do you have any environment variables set related to rdma-core?
> >
> IBV_FORK_SAFE is set to 1
>
> > Is it reated to ibv_fork_init()? It must be called when fork() is called.
>
> Calling ibv_fork_init() doesn’t help, because it immediately checks mm_root, sees it is non-zero (from the parent process’s prior call), and returns doing nothing.
> There is now a simplified test case, see https://github.com/ofiwg/libfabric/issues/9792 for ongoing analysis.
This was all fixed in the kernel, upgrade your kernel and forking
works much more reliably, but I'm not sure this case will work.
It is a libfabric problem if it is expecting memory to be registers
for RDMA and be used by both processes in a fork. That cannot work.
Don't do that, or make the memory MAP_SHARED so that the fork children
can access it.
The bugs seem a bit confused, there is no issue with ibv_device
sharing. Only with actually sharing underlying registered memory. Ie
sharing a SRQ memory pool between the child and parent.
"fork safe" does not magically make all scenarios work, it is
targetted at a specific use case where a rdma using process forks and
the fork does not continue to use rdma.
Jason
next prev parent reply other threads:[~2024-02-12 13:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-11 19:24 Segfault in mlx5 driver on infiniband after application fork Kevan Rehm
2024-02-12 13:33 ` Jason Gunthorpe [this message]
2024-02-12 14:37 ` Kevan Rehm
2024-02-12 14:40 ` Jason Gunthorpe
2024-02-12 16:04 ` Kevan Rehm
2024-02-12 16:12 ` Jason Gunthorpe
2024-02-12 16:37 ` Kevan Rehm
2024-02-12 16:45 ` Jason Gunthorpe
2024-02-16 19:56 ` Kevan Rehm
-- strict thread matches above, loose matches on Subject: below --
2024-02-21 12:51 Kevan Rehm
2024-02-13 16:45 Kevan Rehm
2024-02-07 19:17 Rehm, Kevan
2024-02-08 8:52 ` Leon Romanovsky
2024-02-08 9:05 ` Mark Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240212133303.GA765010@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=kevan.rehm@hpe.com \
--cc=kevanrehm@gmail.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=markzhang@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.