From: Jason Gunthorpe <jgg@ziepe.ca>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "Gal Pressman" <galpress@amazon.com>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>,
"Doug Ledford" <dledford@redhat.com>,
"open list:DMA BUFFER SHARING FRAMEWORK"
<linux-media@vger.kernel.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
linux-rdma <linux-rdma@vger.kernel.org>,
"Oded Gabbay" <ogabbay@habana.ai>,
"Tomer Tayar" <ttayar@habana.ai>,
"Yossi Leybovich" <sleybo@amazon.com>,
"Alexander Matushevsky" <matua@amazon.com>,
"Leon Romanovsky" <leonro@nvidia.com>,
"Jianxin Xiong" <jianxin.xiong@intel.com>,
"John Hubbard" <jhubbard@nvidia.com>
Subject: Re: [RFC] Make use of non-dynamic dmabuf in RDMA
Date: Fri, 20 Aug 2021 09:33:16 -0300 [thread overview]
Message-ID: <20210820123316.GV543798@ziepe.ca> (raw)
In-Reply-To: <CAKMK7uGgQWcs4Va6TGN9akHSSkmTs1i0Kx+6WpeiXWhJKpasLA@mail.gmail.com>
On Fri, Aug 20, 2021 at 09:25:30AM +0200, Daniel Vetter wrote:
> On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote:
> > > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman <galpress@amazon.com> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > Currently, the RDMA subsystem can only work with dynamic dmabuf
> > > > attachments, which requires the RDMA device to support on-demand-paging
> > > > (ODP) which is not common on most devices (only supported by mlx5).
> > > >
> > > > While the dynamic requirement makes sense for certain GPUs, some devices
> > > > (such as habanalabs) have device memory that is always "pinned" and do
> > > > not need/use the move_notify operation.
> > > >
> > > > The motivation of this RFC is to use habanalabs as the dmabuf exporter,
> > > > and EFA as the importer to allow for peer2peer access through libibverbs.
> > > >
> > > > This draft patch changes the dmabuf driver to differentiate between
> > > > static/dynamic attachments by looking at the move_notify op instead of
> > > > the importer_ops struct, and allowing the peer2peer flag to be enabled
> > > > in case of a static exporter.
> > > >
> > > > Thanks
> > > >
> > > > Signed-off-by: Gal Pressman <galpress@amazon.com>
> > >
> > > Given that habanalabs dma-buf support is very firmly in limbo (at
> > > least it's not yet in linux-next or anywhere else) I think you want to
> > > solve that problem first before we tackle the additional issue of
> > > making p2p work without dynamic dma-buf. Without that it just doesn't
> > > make a lot of sense really to talk about solutions here.
> >
> > I have been thinking about adding a dmabuf exporter to VFIO, for
> > basically the same reason habana labs wants to do it.
> >
> > In that situation we'd want to see an approach similar to this as well
> > to have a broad usability.
> >
> > The GPU drivers also want this for certain sophisticated scenarios
> > with RDMA, the intree drivers just haven't quite got there yet.
> >
> > So, I think it is worthwhile to start thinking about this regardless
> > of habana labs.
>
> Oh sure, I've been having these for a while. I think there's two options:
> - some kind of soft-pin, where the contract is that we only revoke
> when absolutely necessary, and it's expected to be catastrophic on the
> importer's side.
Honestly, I'm not very keen on this. We don't really have HW support
in several RDMA scenarios for even catastrophic unpin.
Gal, can EFA even do this for a MR? You basically have to resize the
rkey/lkey to zero length (or invalidate it like a FMR) under the
catstrophic revoke. The rkey/lkey cannot just be destroyed as that
opens a security problem with rkey/lkey re-use.
I think I saw EFA's current out of tree implementations had this bug.
> to do is mmap revoke), and I think that model of exclusive device
> ownership with the option to revoke fits pretty well for at least some
> of the accelerators floating around. In that case importers would
> never get a move_notify (maybe we should call this revoke_notify to
> make it clear it's a bit different) callback, except when the entire
> thing has been yanked. I think that would fit pretty well for VFIO,
> and I think we should be able to make it work for rdma too as some
> kind of auto-deregister. The locking might be fun with both of these
> since I expect some inversions compared to the register path, we'll
> have to figure these out.
It fits semantically nicely, VFIO also has a revoke semantic for BAR
mappings.
The challenge is the RDMA side which doesn't have a 'dma disabled
error state' for objects as part of the spec.
Some HW, like mlx5, can implement this for MR objects (see revoke_mr),
but I don't know if anything else can, and even mlx5 currently can't
do a revoke for any other object type.
I don't know how useful it would be, need to check on some of the use
cases.
The locking is tricky as we have to issue a device command, but that
device command cannot run concurrently with destruction or the tail
part of creation.
Jason
next prev parent reply other threads:[~2021-08-20 12:33 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-18 7:43 [RFC] Make use of non-dynamic dmabuf in RDMA Gal Pressman
2021-08-18 8:00 ` Christian König
2021-08-18 8:37 ` Gal Pressman
2021-08-18 9:34 ` Daniel Vetter
2021-08-19 23:06 ` Jason Gunthorpe
2021-08-20 7:25 ` Daniel Vetter
2021-08-20 12:33 ` Jason Gunthorpe [this message]
2021-08-20 12:58 ` Gal Pressman
2021-08-20 14:32 ` Jason Gunthorpe
2021-08-21 9:16 ` Gal Pressman
2021-08-23 10:43 ` Christian König
2021-08-24 9:06 ` Gal Pressman
2021-08-24 9:32 ` Christian König
2021-08-24 17:27 ` John Hubbard
2021-08-24 17:32 ` Jason Gunthorpe
2021-08-24 17:35 ` John Hubbard
2021-08-24 19:15 ` Dave Airlie
2021-08-24 19:30 ` Jason Gunthorpe
2021-08-24 19:43 ` Alex Deucher
2021-08-24 20:00 ` Xiong, Jianxin
2021-08-25 6:17 ` Christian König
2021-08-25 6:47 ` John Hubbard
2021-08-25 12:18 ` Jason Gunthorpe
2021-08-25 12:27 ` Christian König
2021-08-25 12:38 ` Jason Gunthorpe
2021-08-25 13:51 ` Christian König
2021-08-25 14:47 ` Jason Gunthorpe
2021-08-25 15:14 ` Christian König
2021-08-25 15:49 ` Jason Gunthorpe
2021-08-25 16:02 ` Oded Gabbay
2021-09-01 11:20 ` Gal Pressman
2021-09-01 11:24 ` Christian König
2021-09-02 6:56 ` Gal Pressman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210820123316.GV543798@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=dledford@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=galpress@amazon.com \
--cc=jhubbard@nvidia.com \
--cc=jianxin.xiong@intel.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=matua@amazon.com \
--cc=ogabbay@habana.ai \
--cc=sleybo@amazon.com \
--cc=sumit.semwal@linaro.org \
--cc=ttayar@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.