From: Jason Gunthorpe <jgg@ziepe.ca>
To: Kevan Rehm <kevanrehm@gmail.com>
Cc: Mark Zhang <markzhang@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Yishai Hadas <yishaih@nvidia.com>,
kevan.rehm@hpe.com, chien.tin.tung@intel.com
Subject: Re: Segfault in mlx5 driver on infiniband after application fork
Date: Mon, 12 Feb 2024 10:40:13 -0400 [thread overview]
Message-ID: <20240212144013.GD765010@ziepe.ca> (raw)
In-Reply-To: <8BB93F6F-14EC-4B43-B1F0-5FE185A64073@gmail.com>
On Mon, Feb 12, 2024 at 09:37:25AM -0500, Kevan Rehm wrote:
> > This was all fixed in the kernel, upgrade your kernel and forking
> > works much more reliably, but I'm not sure this case will work.
>
> I agree, that won’t help here.
>
> > It is a libfabric problem if it is expecting memory to be registers
> > for RDMA and be used by both processes in a fork. That cannot work.
> >
> > Don't do that, or make the memory MAP_SHARED so that the fork children
> > can access it.
>
> Libfabric agrees, it wants to use separate registered memory in the
> child, but there doesn’t seem to be a way to do this.
How can that be true? libfabric is the only entity that causes memory
to be registered :)
> > The bugs seem a bit confused, there is no issue with ibv_device
> > sharing. Only with actually sharing underlying registered memory. Ie
> > sharing a SRQ memory pool between the child and parent.
>
> Libfabric calls rdma_get_devices(), then walks the list looking for
> the entry for the correct domain (mlx5_1). It saves a pointer to
> the matching dev_list entry which is an ibv_context structure.
> Wrapped on that ibv_context is the mlx5 context which contains the
> registered pages that had dontfork set when the parent established
^^^^^^^^^^^^^^^^
It does not. context don't have pages, your problem comes from
something else.
Jason
next prev parent reply other threads:[~2024-02-12 14:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-11 19:24 Segfault in mlx5 driver on infiniband after application fork Kevan Rehm
2024-02-12 13:33 ` Jason Gunthorpe
2024-02-12 14:37 ` Kevan Rehm
2024-02-12 14:40 ` Jason Gunthorpe [this message]
2024-02-12 16:04 ` Kevan Rehm
2024-02-12 16:12 ` Jason Gunthorpe
2024-02-12 16:37 ` Kevan Rehm
2024-02-12 16:45 ` Jason Gunthorpe
2024-02-16 19:56 ` Kevan Rehm
-- strict thread matches above, loose matches on Subject: below --
2024-02-21 12:51 Kevan Rehm
2024-02-13 16:45 Kevan Rehm
2024-02-07 19:17 Rehm, Kevan
2024-02-08 8:52 ` Leon Romanovsky
2024-02-08 9:05 ` Mark Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240212144013.GD765010@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=chien.tin.tung@intel.com \
--cc=kevan.rehm@hpe.com \
--cc=kevanrehm@gmail.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=markzhang@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.