* Re: [RFC 00/12] io_uring dmabuf read/write support [not found] <cover.1751035820.git.asml.silence@gmail.com> @ 2025-07-03 14:23 ` Christoph Hellwig 2025-07-03 14:37 ` Christian König 2025-07-07 11:15 ` Pavel Begunkov 0 siblings, 2 replies; 6+ messages in thread From: Christoph Hellwig @ 2025-07-03 14:23 UTC (permalink / raw) To: Pavel Begunkov Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal, Christian König, linux-media, dri-devel, linaro-mm-sig [Note: it would be really useful to Cc all relevant maintainers] On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote: > This series implements it for read/write io_uring requests. The uAPI > looks similar to normal registered buffers, the user will need to > register a dmabuf in io_uring first and then use it as any other > registered buffer. On registration the user also specifies a file > to map the dmabuf for. Just commenting from the in-kernel POV here, where the interface feels wrong. You can't just expose 'the DMA device' up file operations, because there can be and often is more than one. Similarly stuffing a dma_addr_t into an iovec is rather dangerous. The model that should work much better is to have file operations to attach to / detach from a dma_buf, and then have an iter that specifies a dmabuf and offsets into. That way the code behind the file operations can forward the attachment to all the needed devices (including more/less while it remains attached to the file) and can pick the right dma address for each device. I also remember some discussion that new dma-buf importers should use the dynamic imported model for long-term imports, but as I'm everything but an expert in that area I'll let the dma-buf folks speak. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support 2025-07-03 14:23 ` [RFC 00/12] io_uring dmabuf read/write support Christoph Hellwig @ 2025-07-03 14:37 ` Christian König 2025-07-07 11:15 ` Pavel Begunkov 1 sibling, 0 replies; 6+ messages in thread From: Christian König @ 2025-07-03 14:37 UTC (permalink / raw) To: Christoph Hellwig, Pavel Begunkov Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal, linux-media, dri-devel, linaro-mm-sig On 03.07.25 16:23, Christoph Hellwig wrote: > [Note: it would be really useful to Cc all relevant maintainers] > > On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote: >> This series implements it for read/write io_uring requests. The uAPI >> looks similar to normal registered buffers, the user will need to >> register a dmabuf in io_uring first and then use it as any other >> registered buffer. On registration the user also specifies a file >> to map the dmabuf for. > > Just commenting from the in-kernel POV here, where the interface > feels wrong. > > You can't just expose 'the DMA device' up file operations, because > there can be and often is more than one. Similarly stuffing a > dma_addr_t into an iovec is rather dangerous. > > The model that should work much better is to have file operations > to attach to / detach from a dma_buf, and then have an iter that > specifies a dmabuf and offsets into. That way the code behind the > file operations can forward the attachment to all the needed > devices (including more/less while it remains attached to the file) > and can pick the right dma address for each device. > > I also remember some discussion that new dma-buf importers should > use the dynamic imported model for long-term imports, but as I'm > everything but an expert in that area I'll let the dma-buf folks > speak. Completely correct. As long as you don't have a really good explanation and some mechanism to prevent abuse long term pinning of DMA-bufs should be avoided. Regards, Christian. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support 2025-07-03 14:23 ` [RFC 00/12] io_uring dmabuf read/write support Christoph Hellwig 2025-07-03 14:37 ` Christian König @ 2025-07-07 11:15 ` Pavel Begunkov 2025-07-07 14:48 ` Christoph Hellwig 1 sibling, 1 reply; 6+ messages in thread From: Pavel Begunkov @ 2025-07-07 11:15 UTC (permalink / raw) To: Christoph Hellwig Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal, Christian König, linux-media, dri-devel, linaro-mm-sig On 7/3/25 15:23, Christoph Hellwig wrote: > [Note: it would be really useful to Cc all relevant maintainers] Will do next time > On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote: >> This series implements it for read/write io_uring requests. The uAPI >> looks similar to normal registered buffers, the user will need to >> register a dmabuf in io_uring first and then use it as any other >> registered buffer. On registration the user also specifies a file >> to map the dmabuf for. > > Just commenting from the in-kernel POV here, where the interface > feels wrong. > > You can't just expose 'the DMA device' up file operations, because > there can be and often is more than one. Similarly stuffing a > dma_addr_t into an iovec is rather dangerous. > > The model that should work much better is to have file operations > to attach to / detach from a dma_buf, and then have an iter that > specifies a dmabuf and offsets into. That way the code behind the > file operations can forward the attachment to all the needed > devices (including more/less while it remains attached to the file) > and can pick the right dma address for each device. By "iter that specifies a dmabuf" do you mean an opaque file-specific structure allocated inside the new fop? Akin to what Keith proposed back then. That sounds good and has more potential for various optimisations. My concern would be growing struct iov_iter by an extra pointer: struct dma_seg { size_t off; unsigned len; }; struct iov_iter { union { struct iovec *iov; struct dma_seg *dmav; ... }; void *dma_token; }; But maybe that's fine. It's 40B -> 48B, and it'll get back to 40 when / if xarray_start / ITER_XARRAY is removed. > I also remember some discussion that new dma-buf importers should > use the dynamic imported model for long-term imports, but as I'm > everything but an expert in that area I'll let the dma-buf folks > speak. I'll take a look -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support 2025-07-07 11:15 ` Pavel Begunkov @ 2025-07-07 14:48 ` Christoph Hellwig 2025-07-07 15:41 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2025-07-07 14:48 UTC (permalink / raw) To: Pavel Begunkov Cc: Christoph Hellwig, io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal, Christian König, linux-media, dri-devel, linaro-mm-sig On Mon, Jul 07, 2025 at 12:15:54PM +0100, Pavel Begunkov wrote: > > to attach to / detach from a dma_buf, and then have an iter that > > specifies a dmabuf and offsets into. That way the code behind the > > file operations can forward the attachment to all the needed > > devices (including more/less while it remains attached to the file) > > and can pick the right dma address for each device. > > By "iter that specifies a dmabuf" do you mean an opaque file-specific > structure allocated inside the new fop? I mean a reference the actual dma_buf (probably indirect through the file * for it, but listen to the dma_buf experts for that and not me). > Akin to what Keith proposed back > then. That sounds good and has more potential for various optimisations. > My concern would be growing struct iov_iter by an extra pointer: > struct iov_iter { > union { > struct iovec *iov; > struct dma_seg *dmav; > ... > }; > void *dma_token; > }; > > But maybe that's fine. It's 40B -> 48B, Alternatively we could the union point to a struct that has the dma buf pointer and a variable length array of dma_segs. Not sure if that would create a mess in the callers, though. > and it'll get back to > 40 when / if xarray_start / ITER_XARRAY is removed. Would it? At least for 64-bit architectures nr_segs is the same size. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support 2025-07-07 14:48 ` Christoph Hellwig @ 2025-07-07 15:41 ` Pavel Begunkov 2025-07-08 9:45 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Pavel Begunkov @ 2025-07-07 15:41 UTC (permalink / raw) To: Christoph Hellwig Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal, Christian König, linux-media, dri-devel, linaro-mm-sig On 7/7/25 15:48, Christoph Hellwig wrote: > On Mon, Jul 07, 2025 at 12:15:54PM +0100, Pavel Begunkov wrote: >>> to attach to / detach from a dma_buf, and then have an iter that >>> specifies a dmabuf and offsets into. That way the code behind the >>> file operations can forward the attachment to all the needed >>> devices (including more/less while it remains attached to the file) >>> and can pick the right dma address for each device. >> >> By "iter that specifies a dmabuf" do you mean an opaque file-specific >> structure allocated inside the new fop? > > I mean a reference the actual dma_buf (probably indirect through the file > * for it, but listen to the dma_buf experts for that and not me). My expectation is that io_uring would pass struct dma_buf to the file during registration, so that it can do a bunch of work upfront, but iterators will carry sth already pre-attached and pre dma mapped, probably in a file specific format hiding details for multi-device support, and possibly bundled with the dma-buf pointer if necessary. (All modulo move notify which I need to look into first). >> Akin to what Keith proposed back >> then. That sounds good and has more potential for various optimisations. >> My concern would be growing struct iov_iter by an extra pointer: > >> struct iov_iter { >> union { >> struct iovec *iov; >> struct dma_seg *dmav; >> ... >> }; >> void *dma_token; >> }; >> >> But maybe that's fine. It's 40B -> 48B, > > Alternatively we could the union point to a struct that has the dma buf > pointer and a variable length array of dma_segs. Not sure if that would > create a mess in the callers, though. Iteration helpers adjust the pointer, so either it needs to store the pointer directly in iter or keep the current index. It could rely solely on offsets, but that'll be a mess with nested loops (where the inner one would walk some kind of sg table). >> and it'll get back to >> 40 when / if xarray_start / ITER_XARRAY is removed. > > Would it? At least for 64-bit architectures nr_segs is the same size. Ah yes -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support 2025-07-07 15:41 ` Pavel Begunkov @ 2025-07-08 9:45 ` Christoph Hellwig 0 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2025-07-08 9:45 UTC (permalink / raw) To: Pavel Begunkov Cc: Christoph Hellwig, io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal, Christian König, linux-media, dri-devel, linaro-mm-sig On Mon, Jul 07, 2025 at 04:41:23PM +0100, Pavel Begunkov wrote: > > I mean a reference the actual dma_buf (probably indirect through the file > > * for it, but listen to the dma_buf experts for that and not me). > > My expectation is that io_uring would pass struct dma_buf to the io_uring isn't the only user. We've already had one other use case coming up for pre-load of media files in mobile very recently. It's also a really good interface for P2P transfers of any kind. > file during registration, so that it can do a bunch of work upfront, > but iterators will carry sth already pre-attached and pre dma mapped, > probably in a file specific format hiding details for multi-device > support, and possibly bundled with the dma-buf pointer if necessary. > (All modulo move notify which I need to look into first). I'd expect that the exported passed around the dma_buf, and something that has access to it then imports it to the file. This could be directly forwarded to the device for the initial scrope in your series where you only support it for block device files. Now we have two variants: 1) the file instance returns a cookie for the registration that the caller has to pass into every read/write 2) the file instance tracks said cookie itself and matches it on every read/write 1) sounds faster, 2) has more sanity checking and could prevent things from going wrong. (all this is based on my limited dma_buf understanding, corrections always welcome). > > > But maybe that's fine. It's 40B -> 48B, > > > > Alternatively we could the union point to a struct that has the dma buf > > pointer and a variable length array of dma_segs. Not sure if that would > > create a mess in the callers, though. > > Iteration helpers adjust the pointer, so either it needs to store > the pointer directly in iter or keep the current index. It could rely > solely on offsets, but that'll be a mess with nested loops (where the > inner one would walk some kind of sg table). Yeah. Maybe just keep is as a separate pointer growing the structure and see if anyone screams. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-08 9:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <cover.1751035820.git.asml.silence@gmail.com> 2025-07-03 14:23 ` [RFC 00/12] io_uring dmabuf read/write support Christoph Hellwig 2025-07-03 14:37 ` Christian König 2025-07-07 11:15 ` Pavel Begunkov 2025-07-07 14:48 ` Christoph Hellwig 2025-07-07 15:41 ` Pavel Begunkov 2025-07-08 9:45 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).