* Re: [RFC 00/12] io_uring dmabuf read/write support
[not found] <cover.1751035820.git.asml.silence@gmail.com>
@ 2025-07-03 14:23 ` Christoph Hellwig
2025-07-03 14:37 ` Christian König
2025-07-07 11:15 ` Pavel Begunkov
0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-07-03 14:23 UTC (permalink / raw)
To: Pavel Begunkov
Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch,
David Wei, Vishal Verma, Sumit Semwal, Christian König,
linux-media, dri-devel, linaro-mm-sig
[Note: it would be really useful to Cc all relevant maintainers]
On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote:
> This series implements it for read/write io_uring requests. The uAPI
> looks similar to normal registered buffers, the user will need to
> register a dmabuf in io_uring first and then use it as any other
> registered buffer. On registration the user also specifies a file
> to map the dmabuf for.
Just commenting from the in-kernel POV here, where the interface
feels wrong.
You can't just expose 'the DMA device' up file operations, because
there can be and often is more than one. Similarly stuffing a
dma_addr_t into an iovec is rather dangerous.
The model that should work much better is to have file operations
to attach to / detach from a dma_buf, and then have an iter that
specifies a dmabuf and offsets into. That way the code behind the
file operations can forward the attachment to all the needed
devices (including more/less while it remains attached to the file)
and can pick the right dma address for each device.
I also remember some discussion that new dma-buf importers should
use the dynamic imported model for long-term imports, but as I'm
everything but an expert in that area I'll let the dma-buf folks
speak.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support
2025-07-03 14:23 ` [RFC 00/12] io_uring dmabuf read/write support Christoph Hellwig
@ 2025-07-03 14:37 ` Christian König
2025-07-07 11:15 ` Pavel Begunkov
1 sibling, 0 replies; 6+ messages in thread
From: Christian König @ 2025-07-03 14:37 UTC (permalink / raw)
To: Christoph Hellwig, Pavel Begunkov
Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch,
David Wei, Vishal Verma, Sumit Semwal, linux-media, dri-devel,
linaro-mm-sig
On 03.07.25 16:23, Christoph Hellwig wrote:
> [Note: it would be really useful to Cc all relevant maintainers]
>
> On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote:
>> This series implements it for read/write io_uring requests. The uAPI
>> looks similar to normal registered buffers, the user will need to
>> register a dmabuf in io_uring first and then use it as any other
>> registered buffer. On registration the user also specifies a file
>> to map the dmabuf for.
>
> Just commenting from the in-kernel POV here, where the interface
> feels wrong.
>
> You can't just expose 'the DMA device' up file operations, because
> there can be and often is more than one. Similarly stuffing a
> dma_addr_t into an iovec is rather dangerous.
>
> The model that should work much better is to have file operations
> to attach to / detach from a dma_buf, and then have an iter that
> specifies a dmabuf and offsets into. That way the code behind the
> file operations can forward the attachment to all the needed
> devices (including more/less while it remains attached to the file)
> and can pick the right dma address for each device.
>
> I also remember some discussion that new dma-buf importers should
> use the dynamic imported model for long-term imports, but as I'm
> everything but an expert in that area I'll let the dma-buf folks
> speak.
Completely correct.
As long as you don't have a really good explanation and some mechanism to prevent abuse long term pinning of DMA-bufs should be avoided.
Regards,
Christian.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support
2025-07-03 14:23 ` [RFC 00/12] io_uring dmabuf read/write support Christoph Hellwig
2025-07-03 14:37 ` Christian König
@ 2025-07-07 11:15 ` Pavel Begunkov
2025-07-07 14:48 ` Christoph Hellwig
1 sibling, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2025-07-07 11:15 UTC (permalink / raw)
To: Christoph Hellwig
Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch,
David Wei, Vishal Verma, Sumit Semwal, Christian König,
linux-media, dri-devel, linaro-mm-sig
On 7/3/25 15:23, Christoph Hellwig wrote:
> [Note: it would be really useful to Cc all relevant maintainers]
Will do next time
> On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote:
>> This series implements it for read/write io_uring requests. The uAPI
>> looks similar to normal registered buffers, the user will need to
>> register a dmabuf in io_uring first and then use it as any other
>> registered buffer. On registration the user also specifies a file
>> to map the dmabuf for.
>
> Just commenting from the in-kernel POV here, where the interface
> feels wrong.
>
> You can't just expose 'the DMA device' up file operations, because
> there can be and often is more than one. Similarly stuffing a
> dma_addr_t into an iovec is rather dangerous.
>
> The model that should work much better is to have file operations
> to attach to / detach from a dma_buf, and then have an iter that
> specifies a dmabuf and offsets into. That way the code behind the
> file operations can forward the attachment to all the needed
> devices (including more/less while it remains attached to the file)
> and can pick the right dma address for each device.
By "iter that specifies a dmabuf" do you mean an opaque file-specific
structure allocated inside the new fop? Akin to what Keith proposed back
then. That sounds good and has more potential for various optimisations.
My concern would be growing struct iov_iter by an extra pointer:
struct dma_seg {
size_t off;
unsigned len;
};
struct iov_iter {
union {
struct iovec *iov;
struct dma_seg *dmav;
...
};
void *dma_token;
};
But maybe that's fine. It's 40B -> 48B, and it'll get back to
40 when / if xarray_start / ITER_XARRAY is removed.
> I also remember some discussion that new dma-buf importers should
> use the dynamic imported model for long-term imports, but as I'm
> everything but an expert in that area I'll let the dma-buf folks
> speak.
I'll take a look
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support
2025-07-07 11:15 ` Pavel Begunkov
@ 2025-07-07 14:48 ` Christoph Hellwig
2025-07-07 15:41 ` Pavel Begunkov
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2025-07-07 14:48 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Christoph Hellwig, io-uring, linux-block, linux-nvme,
linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal,
Christian König, linux-media, dri-devel, linaro-mm-sig
On Mon, Jul 07, 2025 at 12:15:54PM +0100, Pavel Begunkov wrote:
> > to attach to / detach from a dma_buf, and then have an iter that
> > specifies a dmabuf and offsets into. That way the code behind the
> > file operations can forward the attachment to all the needed
> > devices (including more/less while it remains attached to the file)
> > and can pick the right dma address for each device.
>
> By "iter that specifies a dmabuf" do you mean an opaque file-specific
> structure allocated inside the new fop?
I mean a reference the actual dma_buf (probably indirect through the file
* for it, but listen to the dma_buf experts for that and not me).
> Akin to what Keith proposed back
> then. That sounds good and has more potential for various optimisations.
> My concern would be growing struct iov_iter by an extra pointer:
> struct iov_iter {
> union {
> struct iovec *iov;
> struct dma_seg *dmav;
> ...
> };
> void *dma_token;
> };
>
> But maybe that's fine. It's 40B -> 48B,
Alternatively we could the union point to a struct that has the dma buf
pointer and a variable length array of dma_segs. Not sure if that would
create a mess in the callers, though.
> and it'll get back to
> 40 when / if xarray_start / ITER_XARRAY is removed.
Would it? At least for 64-bit architectures nr_segs is the same size.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support
2025-07-07 14:48 ` Christoph Hellwig
@ 2025-07-07 15:41 ` Pavel Begunkov
2025-07-08 9:45 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2025-07-07 15:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: io-uring, linux-block, linux-nvme, linux-fsdevel, Keith Busch,
David Wei, Vishal Verma, Sumit Semwal, Christian König,
linux-media, dri-devel, linaro-mm-sig
On 7/7/25 15:48, Christoph Hellwig wrote:
> On Mon, Jul 07, 2025 at 12:15:54PM +0100, Pavel Begunkov wrote:
>>> to attach to / detach from a dma_buf, and then have an iter that
>>> specifies a dmabuf and offsets into. That way the code behind the
>>> file operations can forward the attachment to all the needed
>>> devices (including more/less while it remains attached to the file)
>>> and can pick the right dma address for each device.
>>
>> By "iter that specifies a dmabuf" do you mean an opaque file-specific
>> structure allocated inside the new fop?
>
> I mean a reference the actual dma_buf (probably indirect through the file
> * for it, but listen to the dma_buf experts for that and not me).
My expectation is that io_uring would pass struct dma_buf to the
file during registration, so that it can do a bunch of work upfront,
but iterators will carry sth already pre-attached and pre dma mapped,
probably in a file specific format hiding details for multi-device
support, and possibly bundled with the dma-buf pointer if necessary.
(All modulo move notify which I need to look into first).
>> Akin to what Keith proposed back
>> then. That sounds good and has more potential for various optimisations.
>> My concern would be growing struct iov_iter by an extra pointer:
>
>> struct iov_iter {
>> union {
>> struct iovec *iov;
>> struct dma_seg *dmav;
>> ...
>> };
>> void *dma_token;
>> };
>>
>> But maybe that's fine. It's 40B -> 48B,
>
> Alternatively we could the union point to a struct that has the dma buf
> pointer and a variable length array of dma_segs. Not sure if that would
> create a mess in the callers, though.
Iteration helpers adjust the pointer, so either it needs to store
the pointer directly in iter or keep the current index. It could rely
solely on offsets, but that'll be a mess with nested loops (where the
inner one would walk some kind of sg table).
>> and it'll get back to
>> 40 when / if xarray_start / ITER_XARRAY is removed.
>
> Would it? At least for 64-bit architectures nr_segs is the same size.
Ah yes
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 00/12] io_uring dmabuf read/write support
2025-07-07 15:41 ` Pavel Begunkov
@ 2025-07-08 9:45 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-07-08 9:45 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Christoph Hellwig, io-uring, linux-block, linux-nvme,
linux-fsdevel, Keith Busch, David Wei, Vishal Verma, Sumit Semwal,
Christian König, linux-media, dri-devel, linaro-mm-sig
On Mon, Jul 07, 2025 at 04:41:23PM +0100, Pavel Begunkov wrote:
> > I mean a reference the actual dma_buf (probably indirect through the file
> > * for it, but listen to the dma_buf experts for that and not me).
>
> My expectation is that io_uring would pass struct dma_buf to the
io_uring isn't the only user. We've already had one other use case
coming up for pre-load of media files in mobile very recently. It's
also a really good interface for P2P transfers of any kind.
> file during registration, so that it can do a bunch of work upfront,
> but iterators will carry sth already pre-attached and pre dma mapped,
> probably in a file specific format hiding details for multi-device
> support, and possibly bundled with the dma-buf pointer if necessary.
> (All modulo move notify which I need to look into first).
I'd expect that the exported passed around the dma_buf, and something
that has access to it then imports it to the file. This could be
directly forwarded to the device for the initial scrope in your series
where you only support it for block device files.
Now we have two variants:
1) the file instance returns a cookie for the registration that the
caller has to pass into every read/write
2) the file instance tracks said cookie itself and matches it on
every read/write
1) sounds faster, 2) has more sanity checking and could prevent things
from going wrong.
(all this is based on my limited dma_buf understanding, corrections
always welcome).
> > > But maybe that's fine. It's 40B -> 48B,
> >
> > Alternatively we could the union point to a struct that has the dma buf
> > pointer and a variable length array of dma_segs. Not sure if that would
> > create a mess in the callers, though.
>
> Iteration helpers adjust the pointer, so either it needs to store
> the pointer directly in iter or keep the current index. It could rely
> solely on offsets, but that'll be a mess with nested loops (where the
> inner one would walk some kind of sg table).
Yeah. Maybe just keep is as a separate pointer growing the structure
and see if anyone screams.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-08 9:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1751035820.git.asml.silence@gmail.com>
2025-07-03 14:23 ` [RFC 00/12] io_uring dmabuf read/write support Christoph Hellwig
2025-07-03 14:37 ` Christian König
2025-07-07 11:15 ` Pavel Begunkov
2025-07-07 14:48 ` Christoph Hellwig
2025-07-07 15:41 ` Pavel Begunkov
2025-07-08 9:45 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).