From: "Christian König" <christian.koenig@amd.com>
To: Mina Almasry <almasrymina@google.com>
Cc: netdev@vger.kernel.org, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Arnd Bergmann <arnd@arndb.de>, David Ahern <dsahern@kernel.org>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Sumit Semwal <sumit.semwal@linaro.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
Hari Ramakrishnan <rharix@google.com>,
Dan Williams <dan.j.williams@intel.com>,
Andy Lutomirski <luto@kernel.org>,
stephen@networkplumber.org, sdf@google.com
Subject: Re: [RFC PATCH v2 00/11] Device Memory TCP
Date: Fri, 11 Aug 2023 13:02:50 +0200 [thread overview]
Message-ID: <8e50df70-e05b-e27b-958a-6c97943917d4@amd.com> (raw)
In-Reply-To: <CAHS8izPrOcrJpE1ysCM7rwHhBMPvj0vQwzfOyVqdxsVux8oMww@mail.gmail.com>
Am 10.08.23 um 20:44 schrieb Mina Almasry:
> On Thu, Aug 10, 2023 at 3:29 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 10.08.23 um 03:57 schrieb Mina Almasry:
>>> Changes in RFC v2:
>>> ------------------
>>>
>>> The sticking point in RFC v1[1] was the dma-buf pages approach we used to
>>> deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
>>> that attempts to resolve this by implementing scatterlist support in the
>>> networking stack, such that we can import the dma-buf scatterlist
>>> directly.
>> Impressive work, I didn't thought that this would be possible that "easily".
>>
>> Please note that we have considered replacing scatterlists with simple
>> arrays of DMA-addresses in the DMA-buf framework to avoid people trying
>> to access the struct page inside the scatterlist.
>>
> FWIW, I'm not doing anything with the struct pages inside the
> scatterlist. All I need from the scatterlist are the
> sg_dma_address(sg) and the sg_dma_len(sg), and I'm guessing the array
> you're describing will provide exactly those, but let me know if I
> misunderstood.
Your understanding is perfectly correct.
>
>> It might be a good idea to push for that first before this here is
>> finally implemented.
>>
>> GPU drivers already convert the scatterlist used to arrays of
>> DMA-addresses as soon as they get them. This leaves RDMA and V4L as the
>> other two main users which would need to be converted.
>>
>>> This is the approach proposed at a high level here[2].
>>>
>>> Detailed changes:
>>> 1. Replaced dma-buf pages approach with importing scatterlist into the
>>> page pool.
>>> 2. Replace the dma-buf pages centric API with a netlink API.
>>> 3. Removed the TX path implementation - there is no issue with
>>> implementing the TX path with scatterlist approach, but leaving
>>> out the TX path makes it easier to review.
>>> 4. Functionality is tested with this proposal, but I have not conducted
>>> perf testing yet. I'm not sure there are regressions, but I removed
>>> perf claims from the cover letter until they can be re-confirmed.
>>> 5. Added Signed-off-by: contributors to the implementation.
>>> 6. Fixed some bugs with the RX path since RFC v1.
>>>
>>> Any feedback welcome, but specifically the biggest pending questions
>>> needing feedback IMO are:
>>>
>>> 1. Feedback on the scatterlist-based approach in general.
>> As far as I can see this sounds like the right thing to do in general.
>>
>> Question is rather if we should stick with scatterlist, use array of
>> DMA-addresses or maybe even come up with a completely new structure.
>>
> As far as I can tell, it should be trivial to switch this device
> memory TCP implementation to anything that provides:
>
> 1. DMA-addresses (sg_dma_address() equivalent)
> 2. lengths (sg_dma_len() equivalent)
>
> if you go that route. Specifically, I think it will be pretty much a
> localized change to netdev_bind_dmabuf_to_queue() implemented in this
> patch:
> https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@ziepe.ca/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f
Thanks, that's exactly what I wanted to hear.
>
>>> 2. Netlink API (Patch 1 & 2).
>> How does netlink manage the lifetime of objects?
>>
> Netlink itself doesn't handle the lifetime of the binding. However,
> the API I implemented unbinds the dma-buf when the netlink socket is
> destroyed. I do this so that even if the user process crashes or
> forgets to unbind, the dma-buf will still be unbound once the netlink
> socket is closed on the process exit. Details in this patch:
> https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@ziepe.ca/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f
I need to double check the details, but at least of hand that sounds
sufficient for the lifetime requirements of DMA-buf.
Thanks,
Christian.
>
> On Thu, Aug 10, 2023 at 9:07 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Thu, Aug 10, 2023 at 12:29:08PM +0200, Christian König wrote:
>>> Am 10.08.23 um 03:57 schrieb Mina Almasry:
>>>> Changes in RFC v2:
>>>> ------------------
>>>>
>>>> The sticking point in RFC v1[1] was the dma-buf pages approach we used to
>>>> deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
>>>> that attempts to resolve this by implementing scatterlist support in the
>>>> networking stack, such that we can import the dma-buf scatterlist
>>>> directly.
>>> Impressive work, I didn't thought that this would be possible that "easily".
>>>
>>> Please note that we have considered replacing scatterlists with simple
>>> arrays of DMA-addresses in the DMA-buf framework to avoid people trying to
>>> access the struct page inside the scatterlist.
>>>
>>> It might be a good idea to push for that first before this here is finally
>>> implemented.
>>>
>>> GPU drivers already convert the scatterlist used to arrays of DMA-addresses
>>> as soon as they get them. This leaves RDMA and V4L as the other two main
>>> users which would need to be converted.
>> Oh that would be a nightmare for RDMA.
>>
>> We need a standard based way to have scalable lists of DMA addresses :(
>>
>>>> 2. Netlink API (Patch 1 & 2).
>>> How does netlink manage the lifetime of objects?
>> And access control..
>>
> Someone will correct me if I'm wrong but I'm not sure netlink itself
> will do (sufficient) access control. However I meant for the netlink
> API to bind dma-bufs to be a CAP_NET_ADMIN API, and I forgot to add
> this check in this proof-of-concept, sorry. I'll add a CAP_NET_ADMIN
> check in netdev_bind_dmabuf_to_queue() in the next iteration.
next prev parent reply other threads:[~2023-08-11 11:03 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 1:57 [RFC PATCH v2 00/11] Device Memory TCP Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 01/11] net: add netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-08-10 16:04 ` Samudrala, Sridhar
2023-08-11 2:19 ` Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 02/11] netdev: implement netlink api to bind dma-buf to netdevice Mina Almasry
2023-08-13 11:26 ` Leon Romanovsky
2023-08-14 1:10 ` David Ahern
2023-08-14 3:15 ` Mina Almasry
2023-08-30 12:38 ` Yunsheng Lin
2023-09-08 0:47 ` David Wei
2023-08-10 1:57 ` [RFC PATCH v2 03/11] netdev: implement netdevice devmem allocator Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 04/11] memory-provider: updates to core provider API for devmem TCP Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 05/11] memory-provider: implement dmabuf devmem memory provider Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 06/11] page-pool: add device memory support Mina Almasry
2023-08-19 9:51 ` Jesper Dangaard Brouer
2023-08-19 14:08 ` Willem de Bruijn
2023-08-19 15:22 ` Jesper Dangaard Brouer
2023-08-19 15:49 ` David Ahern
2023-08-19 16:12 ` Willem de Bruijn
2023-08-21 21:31 ` Jakub Kicinski
2023-08-22 0:58 ` Willem de Bruijn
2023-08-19 16:11 ` Willem de Bruijn
2023-08-19 20:24 ` Mina Almasry
2023-08-19 20:27 ` Mina Almasry
2023-09-08 2:32 ` David Wei
2023-08-22 6:05 ` Mina Almasry
2023-08-22 12:24 ` Jesper Dangaard Brouer
2023-08-22 23:33 ` Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 07/11] net: support non paged skb frags Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 08/11] net: add support for skbs with unreadable frags Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 09/11] tcp: implement recvmsg() RX path for devmem TCP Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 10/11] net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages Mina Almasry
2023-08-10 1:57 ` [RFC PATCH v2 11/11] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2023-08-10 10:29 ` [RFC PATCH v2 00/11] Device Memory TCP Christian König
2023-08-10 16:06 ` Jason Gunthorpe
2023-08-10 18:44 ` Mina Almasry
2023-08-10 18:58 ` Jason Gunthorpe
2023-08-11 1:56 ` Mina Almasry
2023-08-11 11:02 ` Christian König [this message]
2023-08-14 1:12 ` David Ahern
2023-08-14 2:11 ` Mina Almasry
2023-08-17 18:00 ` Pavel Begunkov
2023-08-17 22:18 ` Mina Almasry
2023-08-23 22:52 ` David Wei
2023-08-24 3:35 ` David Ahern
2023-08-15 13:38 ` David Laight
2023-08-15 14:41 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e50df70-e05b-e27b-958a-6c97943917d4@amd.com \
--to=christian.koenig@amd.com \
--cc=almasrymina@google.com \
--cc=arnd@arndb.de \
--cc=dan.j.williams@intel.com \
--cc=davem@davemloft.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jgg@ziepe.ca \
--cc=kuba@kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=luto@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rharix@google.com \
--cc=sdf@google.com \
--cc=stephen@networkplumber.org \
--cc=sumit.semwal@linaro.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox