* [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
@ 2025-05-17 0:04 Stanislav Fomichev
2025-05-17 0:09 ` Al Viro
0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-05-17 0:04 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, kuba, pabeni, horms, willemb, sagi, stfomichev,
asml.silence, almasrymina, kaiyuanz, linux-kernel
iter_iov_len looks broken for UBUF. When iov_iter_advance is called
for UBUF, it increments iov_offset and also decrements the count.
This makes the iterator only go over half of the range (unless I'm
missing something). Remove iter_iov_len clamping. net_devmem_get_niov_at
already does PAGE_SIZE alignment and min_t(length) makes sure that
the last chunk (potentially smaller than PAGE_SIZE) is accounted
correctly.
Fixes: bd61848900bf ("net: devmem: Implement TX path")
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
---
net/core/datagram.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index e04908276a32..48fcfc1dcf47 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -712,7 +712,6 @@ zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from,
return -EFAULT;
size = min_t(size_t, size, length);
- size = min_t(size_t, size, iter_iov_len(from));
get_netmem(net_iov_to_netmem(niov));
skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off,
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 0:04 [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg Stanislav Fomichev
@ 2025-05-17 0:09 ` Al Viro
2025-05-17 1:24 ` Stanislav Fomichev
0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-17 0:09 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On Fri, May 16, 2025 at 05:04:31PM -0700, Stanislav Fomichev wrote:
> iter_iov_len looks broken for UBUF. When iov_iter_advance is called
> for UBUF, it increments iov_offset and also decrements the count.
> This makes the iterator only go over half of the range (unless I'm
> missing something).
What do you mean by "broken"? iov_iter_len(from) == "how much data is
left in that iterator". That goes for all flavours, UBUF included...
Confused...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 0:09 ` Al Viro
@ 2025-05-17 1:24 ` Stanislav Fomichev
2025-05-17 2:06 ` Al Viro
0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-05-17 1:24 UTC (permalink / raw)
To: Al Viro
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On 05/17, Al Viro wrote:
> On Fri, May 16, 2025 at 05:04:31PM -0700, Stanislav Fomichev wrote:
> > iter_iov_len looks broken for UBUF. When iov_iter_advance is called
> > for UBUF, it increments iov_offset and also decrements the count.
> > This makes the iterator only go over half of the range (unless I'm
> > missing something).
>
> What do you mean by "broken"? iov_iter_len(from) == "how much data is
> left in that iterator". That goes for all flavours, UBUF included...
>
> Confused...
Right, but __ubuf_iovec.iov_len aliases with (total) count:
union {
struct iovec __ubuf_iovec; /* { void *iov_base; size_t iov_len } */
struct {
union { const struct iovec *__iov; ... };
size_t count;
}
}
So any time we do iov_iter_advance(size) (which does iov_offset += size
and count -= size), we essentially subtract the size from count _and_
iov_len. And now, calling iter_iov_len (which does iov_len - iov_offset)
returns bogus length. I don't think that's intended or am I missing something?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 1:24 ` Stanislav Fomichev
@ 2025-05-17 2:06 ` Al Viro
2025-05-17 2:17 ` Stanislav Fomichev
0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-17 2:06 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On Fri, May 16, 2025 at 06:24:03PM -0700, Stanislav Fomichev wrote:
> On 05/17, Al Viro wrote:
> > On Fri, May 16, 2025 at 05:04:31PM -0700, Stanislav Fomichev wrote:
> > > iter_iov_len looks broken for UBUF. When iov_iter_advance is called
> > > for UBUF, it increments iov_offset and also decrements the count.
> > > This makes the iterator only go over half of the range (unless I'm
> > > missing something).
> >
> > What do you mean by "broken"? iov_iter_len(from) == "how much data is
> > left in that iterator". That goes for all flavours, UBUF included...
> >
> > Confused...
[snip]
> iov_len. And now, calling iter_iov_len (which does iov_len - iov_offset)
Wait a sec... Sorry, I've misread that as iov_iter_count(); iter_iov_len()
(as well as iter_iov_addr()) should not be used on anything other that
ITER_IOVEC
<checks>
Wait, in the same commit there's
+ if (iov_iter_type(from) != ITER_IOVEC)
+ return -EFAULT;
shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 2:06 ` Al Viro
@ 2025-05-17 2:17 ` Stanislav Fomichev
2025-05-17 3:39 ` Al Viro
0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-05-17 2:17 UTC (permalink / raw)
To: Al Viro
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On 05/17, Al Viro wrote:
> On Fri, May 16, 2025 at 06:24:03PM -0700, Stanislav Fomichev wrote:
> > On 05/17, Al Viro wrote:
> > > On Fri, May 16, 2025 at 05:04:31PM -0700, Stanislav Fomichev wrote:
> > > > iter_iov_len looks broken for UBUF. When iov_iter_advance is called
> > > > for UBUF, it increments iov_offset and also decrements the count.
> > > > This makes the iterator only go over half of the range (unless I'm
> > > > missing something).
> > >
> > > What do you mean by "broken"? iov_iter_len(from) == "how much data is
> > > left in that iterator". That goes for all flavours, UBUF included...
> > >
> > > Confused...
>
> [snip]
>
> > iov_len. And now, calling iter_iov_len (which does iov_len - iov_offset)
>
> Wait a sec... Sorry, I've misread that as iov_iter_count(); iter_iov_len()
> (as well as iter_iov_addr()) should not be used on anything other that
> ITER_IOVEC
>
> <checks>
>
> Wait, in the same commit there's
> + if (iov_iter_type(from) != ITER_IOVEC)
> + return -EFAULT;
>
> shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
Yeah, I want to remove that part as well:
https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
Otherwise, sendmsg() with a single IOV is not accepted, which makes not
sense.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 2:17 ` Stanislav Fomichev
@ 2025-05-17 3:39 ` Al Viro
2025-05-17 3:53 ` Stanislav Fomichev
0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-17 3:39 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On Fri, May 16, 2025 at 07:17:23PM -0700, Stanislav Fomichev wrote:
> > Wait, in the same commit there's
> > + if (iov_iter_type(from) != ITER_IOVEC)
> > + return -EFAULT;
> >
> > shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
>
> Yeah, I want to remove that part as well:
>
> https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
>
> Otherwise, sendmsg() with a single IOV is not accepted, which makes not
> sense.
Wait a minute. What's there to prevent a call with two ranges far from each other?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 3:39 ` Al Viro
@ 2025-05-17 3:53 ` Stanislav Fomichev
2025-05-17 4:05 ` Al Viro
0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-05-17 3:53 UTC (permalink / raw)
To: Al Viro
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On 05/17, Al Viro wrote:
> On Fri, May 16, 2025 at 07:17:23PM -0700, Stanislav Fomichev wrote:
> > > Wait, in the same commit there's
> > > + if (iov_iter_type(from) != ITER_IOVEC)
> > > + return -EFAULT;
> > >
> > > shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
> >
> > Yeah, I want to remove that part as well:
> >
> > https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
> >
> > Otherwise, sendmsg() with a single IOV is not accepted, which makes not
> > sense.
>
> Wait a minute. What's there to prevent a call with two ranges far from each other?
It is perfectly possible to have a call with two disjoint ranges,
net_devmem_get_niov_at should correctly resolve it to the IOVA in the
dmabuf. Not sure I understand why it's an issue, can you pls clarify?
What we want to have here is:
1. sendmsg(msg.msg_iov = [{ .iov_base = .., .iov_len = x }])
2. sendmsg(msg.msg_iov = [{ .iov_base = .., .iov_len = x },
{ .iov_base = .., .iov_len = y])
Both should be accepted. Currently only (2) works because of that ITER_IOVEC
check. Once I remove it, I hit the issue where iter_iov_len starts to
return too early.
(Documentation/networking/devmem.rst has a bit more info on the sendmsg UAPI
with dmabufs if that's still confusing)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 3:53 ` Stanislav Fomichev
@ 2025-05-17 4:05 ` Al Viro
2025-05-17 4:29 ` Stanislav Fomichev
0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-17 4:05 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On Fri, May 16, 2025 at 08:53:09PM -0700, Stanislav Fomichev wrote:
> On 05/17, Al Viro wrote:
> > On Fri, May 16, 2025 at 07:17:23PM -0700, Stanislav Fomichev wrote:
> > > > Wait, in the same commit there's
> > > > + if (iov_iter_type(from) != ITER_IOVEC)
> > > > + return -EFAULT;
> > > >
> > > > shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
> > >
> > > Yeah, I want to remove that part as well:
> > >
> > > https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
> > >
> > > Otherwise, sendmsg() with a single IOV is not accepted, which makes not
> > > sense.
> >
> > Wait a minute. What's there to prevent a call with two ranges far from each other?
>
> It is perfectly possible to have a call with two disjoint ranges,
> net_devmem_get_niov_at should correctly resolve it to the IOVA in the
> dmabuf. Not sure I understand why it's an issue, can you pls clarify?
Er... OK, the following is given an from with two iovecs.
while (length && iov_iter_count(from)) {
if (i == MAX_SKB_FRAGS)
return -EMSGSIZE;
virt_addr = (size_t)iter_iov_addr(from);
OK, that's iov_base of the first one.
niov = net_devmem_get_niov_at(binding, virt_addr, &off, &size);
if (!niov)
return -EFAULT;
Whatever it does, it does *NOT* see iov_len of the first iovec. Looks like
it tries to set something up, storing the length of what it had set up
into size
size = min_t(size_t, size, length);
... no more than length, OK. Suppose length is considerably more than iov_len
of the first iovec.
size = min_t(size_t, size, iter_iov_len(from));
... now trim it down to iov_len of that sucker. That's what you want to remove,
right? What happens if iov_len is shorter than what we have in size?
get_netmem(net_iov_to_netmem(niov));
skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off,
size, PAGE_SIZE);
Still not looking at that iov_len...
iov_iter_advance(from, size);
... and now that you've removed the second min_t, size happens to be greater
than that iovec[0].iov_len. So we advance into the second iovec, skipping
size - iovec[0].iov_len bytes after iovev[1].iov_base.
length -= size;
i++;
}
... and proceed into the second iteration.
Would you agree that behaviour ought to depend upon the iovec[0].iov_len?
If nothing else, it affects which data do you want to be sent, and I don't
see where would anything even look at that value with your change...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 4:05 ` Al Viro
@ 2025-05-17 4:29 ` Stanislav Fomichev
2025-05-17 4:53 ` Mina Almasry
2025-05-20 15:10 ` Pavel Begunkov
0 siblings, 2 replies; 11+ messages in thread
From: Stanislav Fomichev @ 2025-05-17 4:29 UTC (permalink / raw)
To: Al Viro
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
asml.silence, almasrymina, kaiyuanz, linux-kernel
On 05/17, Al Viro wrote:
> On Fri, May 16, 2025 at 08:53:09PM -0700, Stanislav Fomichev wrote:
> > On 05/17, Al Viro wrote:
> > > On Fri, May 16, 2025 at 07:17:23PM -0700, Stanislav Fomichev wrote:
> > > > > Wait, in the same commit there's
> > > > > + if (iov_iter_type(from) != ITER_IOVEC)
> > > > > + return -EFAULT;
> > > > >
> > > > > shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
> > > >
> > > > Yeah, I want to remove that part as well:
> > > >
> > > > https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
> > > >
> > > > Otherwise, sendmsg() with a single IOV is not accepted, which makes not
> > > > sense.
> > >
> > > Wait a minute. What's there to prevent a call with two ranges far from each other?
> >
> > It is perfectly possible to have a call with two disjoint ranges,
> > net_devmem_get_niov_at should correctly resolve it to the IOVA in the
> > dmabuf. Not sure I understand why it's an issue, can you pls clarify?
>
> Er... OK, the following is given an from with two iovecs.
>
> while (length && iov_iter_count(from)) {
> if (i == MAX_SKB_FRAGS)
> return -EMSGSIZE;
>
> virt_addr = (size_t)iter_iov_addr(from);
>
> OK, that's iov_base of the first one.
>
> niov = net_devmem_get_niov_at(binding, virt_addr, &off, &size);
> if (!niov)
> return -EFAULT;
> Whatever it does, it does *NOT* see iov_len of the first iovec. Looks like
> it tries to set something up, storing the length of what it had set up
> into size
>
> size = min_t(size_t, size, length);
> ... no more than length, OK. Suppose length is considerably more than iov_len
> of the first iovec.
>
> size = min_t(size_t, size, iter_iov_len(from));
> ... now trim it down to iov_len of that sucker. That's what you want to remove,
> right? What happens if iov_len is shorter than what we have in size?
>
> get_netmem(net_iov_to_netmem(niov));
> skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off,
> size, PAGE_SIZE);
> Still not looking at that iov_len...
>
> iov_iter_advance(from, size);
> ... and now that you've removed the second min_t, size happens to be greater
> than that iovec[0].iov_len. So we advance into the second iovec, skipping
> size - iovec[0].iov_len bytes after iovev[1].iov_base.
> length -= size;
> i++;
> }
> ... and proceed into the second iteration.
>
> Would you agree that behaviour ought to depend upon the iovec[0].iov_len?
> If nothing else, it affects which data do you want to be sent, and I don't
> see where would anything even look at that value with your change...
Yes, I think you have a point. I was thinking that net_devmem_get_niov_at
will expose max size of the chunk, but I agree that the iov might have
requested smaller part and it will bug out in case of multiple chunks...
Are you open to making iter_iov_len more ubuf friendly? Something like
the following:
static inline size_t iter_iov_len(const struct iov_iter *i)
{
if (iter->iter_type == ITER_UBUF)
return ni->count;
return iter_iov(i)->iov_len - i->iov_offset;
}
Or should I handle the iter_type here?
if (iter->iter_type == ITER_IOVEC)
size = min_t(size_t, size, iter_iov_len(from));
/* else
I don think I need to clamp to iov_iter_count() because length
should take care of it */
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 4:29 ` Stanislav Fomichev
@ 2025-05-17 4:53 ` Mina Almasry
2025-05-20 15:10 ` Pavel Begunkov
1 sibling, 0 replies; 11+ messages in thread
From: Mina Almasry @ 2025-05-17 4:53 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Al Viro, netdev, davem, edumazet, kuba, pabeni, horms, willemb,
sagi, asml.silence, kaiyuanz, linux-kernel
On Fri, May 16, 2025 at 9:29 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 05/17, Al Viro wrote:
> > On Fri, May 16, 2025 at 08:53:09PM -0700, Stanislav Fomichev wrote:
> > > On 05/17, Al Viro wrote:
> > > > On Fri, May 16, 2025 at 07:17:23PM -0700, Stanislav Fomichev wrote:
> > > > > > Wait, in the same commit there's
> > > > > > + if (iov_iter_type(from) != ITER_IOVEC)
> > > > > > + return -EFAULT;
> > > > > >
> > > > > > shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
> > > > >
> > > > > Yeah, I want to remove that part as well:
> > > > >
> > > > > https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
> > > > >
> > > > > Otherwise, sendmsg() with a single IOV is not accepted, which makes not
> > > > > sense.
> > > >
> > > > Wait a minute. What's there to prevent a call with two ranges far from each other?
> > >
> > > It is perfectly possible to have a call with two disjoint ranges,
> > > net_devmem_get_niov_at should correctly resolve it to the IOVA in the
> > > dmabuf. Not sure I understand why it's an issue, can you pls clarify?
> >
> > Er... OK, the following is given an from with two iovecs.
> >
> > while (length && iov_iter_count(from)) {
> > if (i == MAX_SKB_FRAGS)
> > return -EMSGSIZE;
> >
> > virt_addr = (size_t)iter_iov_addr(from);
> >
> > OK, that's iov_base of the first one.
> >
> > niov = net_devmem_get_niov_at(binding, virt_addr, &off, &size);
> > if (!niov)
> > return -EFAULT;
> > Whatever it does, it does *NOT* see iov_len of the first iovec. Looks like
> > it tries to set something up, storing the length of what it had set up
> > into size
> >
> > size = min_t(size_t, size, length);
> > ... no more than length, OK. Suppose length is considerably more than iov_len
> > of the first iovec.
> >
> > size = min_t(size_t, size, iter_iov_len(from));
> > ... now trim it down to iov_len of that sucker. That's what you want to remove,
> > right? What happens if iov_len is shorter than what we have in size?
> >
> > get_netmem(net_iov_to_netmem(niov));
> > skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off,
> > size, PAGE_SIZE);
> > Still not looking at that iov_len...
> >
> > iov_iter_advance(from, size);
> > ... and now that you've removed the second min_t, size happens to be greater
> > than that iovec[0].iov_len. So we advance into the second iovec, skipping
> > size - iovec[0].iov_len bytes after iovev[1].iov_base.
> > length -= size;
> > i++;
> > }
> > ... and proceed into the second iteration.
> >
> > Would you agree that behaviour ought to depend upon the iovec[0].iov_len?
> > If nothing else, it affects which data do you want to be sent, and I don't
> > see where would anything even look at that value with your change...
>
> Yes, I think you have a point. I was thinking that net_devmem_get_niov_at
> will expose max size of the chunk, but I agree that the iov might have
> requested smaller part and it will bug out in case of multiple chunks...
>
> Are you open to making iter_iov_len more ubuf friendly? Something like
> the following:
>
> static inline size_t iter_iov_len(const struct iov_iter *i)
> {
> if (iter->iter_type == ITER_UBUF)
> return ni->count;
> return iter_iov(i)->iov_len - i->iov_offset;
> }
>
> Or should I handle the iter_type here?
>
> if (iter->iter_type == ITER_IOVEC)
> size = min_t(size_t, size, iter_iov_len(from));
> /* else
> I don think I need to clamp to iov_iter_count() because length
> should take care of it */
Sorry about this, I was worried about testing multiple io_vecs because
I imagined the single io_vec is just a subcase of that. I did not
expect a single iov converts into a different iter type and the iter
type behaves a bit differently.
I think both approaches Stan points to should be fine. The generic
change may break existing users of ITER_UBUF though.
Lets have some test coverage of this in ncdevmem, if possible.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg
2025-05-17 4:29 ` Stanislav Fomichev
2025-05-17 4:53 ` Mina Almasry
@ 2025-05-20 15:10 ` Pavel Begunkov
1 sibling, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-05-20 15:10 UTC (permalink / raw)
To: Stanislav Fomichev, Al Viro
Cc: netdev, davem, edumazet, kuba, pabeni, horms, willemb, sagi,
almasrymina, kaiyuanz, linux-kernel
On 5/17/25 05:29, Stanislav Fomichev wrote:
> On 05/17, Al Viro wrote:
>> On Fri, May 16, 2025 at 08:53:09PM -0700, Stanislav Fomichev wrote:
>>> On 05/17, Al Viro wrote:
>>>> On Fri, May 16, 2025 at 07:17:23PM -0700, Stanislav Fomichev wrote:
>>>>>> Wait, in the same commit there's
>>>>>> + if (iov_iter_type(from) != ITER_IOVEC)
>>>>>> + return -EFAULT;
>>>>>>
>>>>>> shortly prior to the loop iter_iov_{addr,len}() are used. What am I missing now?
>>>>>
>>>>> Yeah, I want to remove that part as well:
>>>>>
>>>>> https://lore.kernel.org/netdev/20250516225441.527020-1-stfomichev@gmail.com/T/#u
>>>>>
>>>>> Otherwise, sendmsg() with a single IOV is not accepted, which makes not
>>>>> sense.
>>>>
>>>> Wait a minute. What's there to prevent a call with two ranges far from each other?
>>>
>>> It is perfectly possible to have a call with two disjoint ranges,
>>> net_devmem_get_niov_at should correctly resolve it to the IOVA in the
>>> dmabuf. Not sure I understand why it's an issue, can you pls clarify?
>>
>> Er... OK, the following is given an from with two iovecs.
>>
>> while (length && iov_iter_count(from)) {
>> if (i == MAX_SKB_FRAGS)
>> return -EMSGSIZE;
>>
>> virt_addr = (size_t)iter_iov_addr(from);
>>
>> OK, that's iov_base of the first one.
>>
>> niov = net_devmem_get_niov_at(binding, virt_addr, &off, &size);
>> if (!niov)
>> return -EFAULT;
>> Whatever it does, it does *NOT* see iov_len of the first iovec. Looks like
>> it tries to set something up, storing the length of what it had set up
>> into size
>>
>> size = min_t(size_t, size, length);
>> ... no more than length, OK. Suppose length is considerably more than iov_len
>> of the first iovec.
>>
>> size = min_t(size_t, size, iter_iov_len(from));
>> ... now trim it down to iov_len of that sucker. That's what you want to remove,
>> right? What happens if iov_len is shorter than what we have in size?
>>
>> get_netmem(net_iov_to_netmem(niov));
>> skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off,
>> size, PAGE_SIZE);
>> Still not looking at that iov_len...
>>
>> iov_iter_advance(from, size);
>> ... and now that you've removed the second min_t, size happens to be greater
>> than that iovec[0].iov_len. So we advance into the second iovec, skipping
>> size - iovec[0].iov_len bytes after iovev[1].iov_base.
>> length -= size;
>> i++;
>> }
>> ... and proceed into the second iteration.
>>
>> Would you agree that behaviour ought to depend upon the iovec[0].iov_len?
>> If nothing else, it affects which data do you want to be sent, and I don't
>> see where would anything even look at that value with your change...
>
> Yes, I think you have a point. I was thinking that net_devmem_get_niov_at
> will expose max size of the chunk, but I agree that the iov might have
> requested smaller part and it will bug out in case of multiple chunks...
>
> Are you open to making iter_iov_len more ubuf friendly? Something like
> the following:
>
> static inline size_t iter_iov_len(const struct iov_iter *i)
> {
> if (iter->iter_type == ITER_UBUF)
> return ni->count;
> return iter_iov(i)->iov_len - i->iov_offset;
> }
>
> Or should I handle the iter_type here?
>
> if (iter->iter_type == ITER_IOVEC)
> size = min_t(size_t, size, iter_iov_len(from));
> /* else
> I don think I need to clamp to iov_iter_count() because length
> should take care of it */
FWIW, since it's not devmem specific, I looked through the callers:
io_uring handles ubuf separately, read_write.c and madvise.c advance
strictly by iov_iter_count() and hence always consume ubuf iters in
one go. So the only one with a real problem is devmem tx, which and
hasn't been released yet.
With that said, it feels error prone, and IMO we should either make the
helper work with ubuf well as Stan suggested, or force _all_ users to
check if it's ubuf. Also, I can't say for madvise, but it's not in
any hot / important path of neither io_uring nor rw, so we likely
don't care about this extra check.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-05-20 15:09 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-17 0:04 [PATCH net-next] net: devmem: remove min_t(iter_iov_len) in sendmsg Stanislav Fomichev
2025-05-17 0:09 ` Al Viro
2025-05-17 1:24 ` Stanislav Fomichev
2025-05-17 2:06 ` Al Viro
2025-05-17 2:17 ` Stanislav Fomichev
2025-05-17 3:39 ` Al Viro
2025-05-17 3:53 ` Stanislav Fomichev
2025-05-17 4:05 ` Al Viro
2025-05-17 4:29 ` Stanislav Fomichev
2025-05-17 4:53 ` Mina Almasry
2025-05-20 15:10 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).