* [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
@ 2025-06-30 7:10 Feng Yang
2025-07-03 8:48 ` Paolo Abeni
0 siblings, 1 reply; 8+ messages in thread
From: Feng Yang @ 2025-06-30 7:10 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, willemb, almasrymina,
kerneljasonxing, ebiggers, asml.silence, aleksander.lobakin,
stfomichev, david.laight.linux
Cc: yangfeng, netdev, linux-kernel
From: Feng Yang <yangfeng@kylinos.cn>
The "MSG_MORE" flag is added to improve the transmission performance of large packets.
The improvement is more significant for TCP, while there is a slight enhancement for UDP.
When using sockmap for forwarding, the average latency for different packet sizes
after sending 10,000 packets(TCP) is as follows:
size old(us) new(us)
512 56 55
1472 58 58
1600 106 81
3000 145 105
5000 182 125
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
---
Changes in v3:
- Use Msg_MORE flag. Thanks: Eric Dumazet, David Laight.
- Link to v2: https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/
Changes in v2:
- Delete dynamic memory allocation, thanks: Paolo Abeni,Stanislav Fomichev.
- Link to v1: https://lore.kernel.org/all/20250623084212.122284-1-yangfeng59949@163.com/
---
net/core/skbuff.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 85fc82f72d26..cd1ed96607a5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3252,6 +3252,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
kv.iov_len = slen;
memset(&msg, 0, sizeof(msg));
msg.msg_flags = MSG_DONTWAIT | flags;
+ if (slen < len)
+ msg.msg_flags |= MSG_MORE;
iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
@@ -3292,6 +3294,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
flags,
};
+ if (slen < len)
+ msg.msg_flags |= MSG_MORE;
bvec_set_page(&bvec, skb_frag_page(frag), slen,
skb_frag_off(frag) + offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-06-30 7:10 [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission Feng Yang
@ 2025-07-03 8:48 ` Paolo Abeni
2025-07-03 11:44 ` David Laight
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2025-07-03 8:48 UTC (permalink / raw)
To: Feng Yang, davem, edumazet, kuba, horms, willemb, almasrymina,
kerneljasonxing, ebiggers, asml.silence, aleksander.lobakin,
stfomichev, david.laight.linux
Cc: yangfeng, netdev, linux-kernel
On 6/30/25 9:10 AM, Feng Yang wrote:
> From: Feng Yang <yangfeng@kylinos.cn>
>
> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> The improvement is more significant for TCP, while there is a slight enhancement for UDP.
I'm sorry for the conflicting input, but i fear we can't do this for
UDP: unconditionally changing the wire packet layout may break the
application, and or at very least incur in unexpected fragmentation issues.
/P
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-07-03 8:48 ` Paolo Abeni
@ 2025-07-03 11:44 ` David Laight
2025-07-04 9:26 ` Feng Yang
0 siblings, 1 reply; 8+ messages in thread
From: David Laight @ 2025-07-03 11:44 UTC (permalink / raw)
To: Paolo Abeni
Cc: Feng Yang, davem, edumazet, kuba, horms, willemb, almasrymina,
kerneljasonxing, ebiggers, asml.silence, aleksander.lobakin,
stfomichev, yangfeng, netdev, linux-kernel
On Thu, 3 Jul 2025 10:48:40 +0200
Paolo Abeni <pabeni@redhat.com> wrote:
> On 6/30/25 9:10 AM, Feng Yang wrote:
> > From: Feng Yang <yangfeng@kylinos.cn>
> >
> > The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> > The improvement is more significant for TCP, while there is a slight enhancement for UDP.
>
> I'm sorry for the conflicting input, but i fear we can't do this for
> UDP: unconditionally changing the wire packet layout may break the
> application, and or at very least incur in unexpected fragmentation issues.
Does the code currently work for UDP?
I'd have thought the skb being sent was an entire datagram.
But each semdmsg() is going to send a separate datagram.
IIRC for UDP MSG_MORE indicates that the next send() will be
part of the same datagram - so the actual send can't be done
until the final fragment (without MSG_MORE) is sent.
None of the versions is right for SCTP.
The skb being sent needs to be processed as a single entity.
Here MSG_MORE tells the stack that more messages follow and can be put
into a single ethernet frame - but they are separate protocol messages.
OTOH I've not looked at where this code is called from.
In particular, when it would be called with non-linear skb.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-07-03 11:44 ` David Laight
@ 2025-07-04 9:26 ` Feng Yang
2025-07-04 15:50 ` Paolo Abeni
0 siblings, 1 reply; 8+ messages in thread
From: Feng Yang @ 2025-07-04 9:26 UTC (permalink / raw)
To: david.laight.linux
Cc: aleksander.lobakin, almasrymina, asml.silence, davem, ebiggers,
edumazet, horms, kerneljasonxing, kuba, linux-kernel, netdev,
pabeni, stfomichev, willemb, yangfeng59949, yangfeng
Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
> On Thu, 3 Jul 2025 10:48:40 +0200
> Paolo Abeni <pabeni@redhat.com> wrote:
>
> > On 6/30/25 9:10 AM, Feng Yang wrote:
> > > From: Feng Yang <yangfeng@kylinos.cn>
> > >
> > > The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> > > The improvement is more significant for TCP, while there is a slight enhancement for UDP.
> >
> > I'm sorry for the conflicting input, but i fear we can't do this for
> > UDP: unconditionally changing the wire packet layout may break the
> > application, and or at very least incur in unexpected fragmentation issues.
>
> Does the code currently work for UDP?
>
> I'd have thought the skb being sent was an entire datagram.
> But each semdmsg() is going to send a separate datagram.
> IIRC for UDP MSG_MORE indicates that the next send() will be
> part of the same datagram - so the actual send can't be done
> until the final fragment (without MSG_MORE) is sent.
If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?
If that's not feasible, would the v2 version of the code work for UDP?
Thanks.
> None of the versions is right for SCTP.
__skb_send_sock
......
INDIRECT_CALL_2(sendmsg, sendmsg_locked, sendmsg_unlocked, sk, &msg);
......
This sending code doesn't seem to call sctp_sendmsg.
> The skb being sent needs to be processed as a single entity.
> Here MSG_MORE tells the stack that more messages follow and can be put
> into a single ethernet frame - but they are separate protocol messages.
>
> OTOH I've not looked at where this code is called from.
> In particular, when it would be called with non-linear skb.
>
> David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-07-04 9:26 ` Feng Yang
@ 2025-07-04 15:50 ` Paolo Abeni
2025-07-05 7:16 ` David Laight
2025-07-07 6:17 ` Feng Yang
0 siblings, 2 replies; 8+ messages in thread
From: Paolo Abeni @ 2025-07-04 15:50 UTC (permalink / raw)
To: Feng Yang, david.laight.linux
Cc: aleksander.lobakin, almasrymina, asml.silence, davem, ebiggers,
edumazet, horms, kerneljasonxing, kuba, linux-kernel, netdev,
stfomichev, willemb, yangfeng
On 7/4/25 11:26 AM, Feng Yang wrote:
> Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
>
>> On Thu, 3 Jul 2025 10:48:40 +0200
>> Paolo Abeni <pabeni@redhat.com> wrote:
>>
>>> On 6/30/25 9:10 AM, Feng Yang wrote:
>>>> From: Feng Yang <yangfeng@kylinos.cn>
>>>>
>>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
>>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP.
>>>
>>> I'm sorry for the conflicting input, but i fear we can't do this for
>>> UDP: unconditionally changing the wire packet layout may break the
>>> application, and or at very least incur in unexpected fragmentation issues.
>>
>> Does the code currently work for UDP?
>>
>> I'd have thought the skb being sent was an entire datagram.
>> But each semdmsg() is going to send a separate datagram.
>> IIRC for UDP MSG_MORE indicates that the next send() will be
>> part of the same datagram - so the actual send can't be done
>> until the final fragment (without MSG_MORE) is sent.
>
> If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?
Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets.
With MSG_MORE on the first N-1 calls, the stack will emit a single
packet with larger size.
UDP application may relay on packet size for protocol semantic. i.e. the
application level message size could be expected to be equal to the
(wire) packet size itself.
Unexpectedly aggregating the packets may break the application. Also it
can lead to IP fragmentation, which in turn could kill performances.
> If that's not feasible, would the v2 version of the code work for UDP?
My ask is to explicitly avoid MSG_MORE when the transport is UDP.
/P
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-07-04 15:50 ` Paolo Abeni
@ 2025-07-05 7:16 ` David Laight
2025-07-07 6:17 ` Feng Yang
1 sibling, 0 replies; 8+ messages in thread
From: David Laight @ 2025-07-05 7:16 UTC (permalink / raw)
To: Paolo Abeni
Cc: Feng Yang, aleksander.lobakin, almasrymina, asml.silence, davem,
ebiggers, edumazet, horms, kerneljasonxing, kuba, linux-kernel,
netdev, stfomichev, willemb, yangfeng
On Fri, 4 Jul 2025 17:50:42 +0200
Paolo Abeni <pabeni@redhat.com> wrote:
> On 7/4/25 11:26 AM, Feng Yang wrote:
> > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
> >
> >> On Thu, 3 Jul 2025 10:48:40 +0200
> >> Paolo Abeni <pabeni@redhat.com> wrote:
> >>
> >>> On 6/30/25 9:10 AM, Feng Yang wrote:
> >>>> From: Feng Yang <yangfeng@kylinos.cn>
> >>>>
> >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP.
> >>>
> >>> I'm sorry for the conflicting input, but i fear we can't do this for
> >>> UDP: unconditionally changing the wire packet layout may break the
> >>> application, and or at very least incur in unexpected fragmentation issues.
> >>
> >> Does the code currently work for UDP?
> >>
> >> I'd have thought the skb being sent was an entire datagram.
> >> But each semdmsg() is going to send a separate datagram.
> >> IIRC for UDP MSG_MORE indicates that the next send() will be
> >> part of the same datagram - so the actual send can't be done
> >> until the final fragment (without MSG_MORE) is sent.
> >
> > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?
>
> Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets.
>
> With MSG_MORE on the first N-1 calls, the stack will emit a single
> packet with larger size.
>
> UDP application may relay on packet size for protocol semantic. i.e. the
> application level message size could be expected to be equal to the
> (wire) packet size itself.
Correct, but the function is __skb_send_sock() - so you'd expect it to
send the 'message' held in the skb to the socket.
I don't think that the fact that the skb has fragments should make any
difference to what is sent.
In other words it ought to be valid for any code to 'linearize' the skb.
David
>
> Unexpectedly aggregating the packets may break the application. Also it
> can lead to IP fragmentation, which in turn could kill performances.
>
> > If that's not feasible, would the v2 version of the code work for UDP?
>
> My ask is to explicitly avoid MSG_MORE when the transport is UDP.
>
> /P
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-07-04 15:50 ` Paolo Abeni
2025-07-05 7:16 ` David Laight
@ 2025-07-07 6:17 ` Feng Yang
2025-07-07 7:19 ` Eric Dumazet
1 sibling, 1 reply; 8+ messages in thread
From: Feng Yang @ 2025-07-07 6:17 UTC (permalink / raw)
To: pabeni
Cc: aleksander.lobakin, almasrymina, asml.silence, davem,
david.laight.linux, ebiggers, edumazet, horms, kerneljasonxing,
kuba, linux-kernel, netdev, stfomichev, willemb, yangfeng59949,
yangfeng
On Sat, 5 Jul 2025 08:16:40 +0100 David Laight <david.laight.linux@gmail.com> wrote:
> On Fri, 4 Jul 2025 17:50:42 +0200
> Paolo Abeni <pabeni@redhat.com> wrote:
>
> > On 7/4/25 11:26 AM, Feng Yang wrote:
> > > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
> > >
> > >> On Thu, 3 Jul 2025 10:48:40 +0200
> > >> Paolo Abeni <pabeni@redhat.com> wrote:
> > >>
> > >>> On 6/30/25 9:10 AM, Feng Yang wrote:
> > >>>> From: Feng Yang <yangfeng@kylinos.cn>
> > >>>>
> > >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> > >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP.
> > >>>
> > >>> I'm sorry for the conflicting input, but i fear we can't do this for
> > >>> UDP: unconditionally changing the wire packet layout may break the
> > >>> application, and or at very least incur in unexpected fragmentation issues.
> > >>
> > >> Does the code currently work for UDP?
> > >>
> > >> I'd have thought the skb being sent was an entire datagram.
> > >> But each semdmsg() is going to send a separate datagram.
> > >> IIRC for UDP MSG_MORE indicates that the next send() will be
> > >> part of the same datagram - so the actual send can't be done
> > >> until the final fragment (without MSG_MORE) is sent.
> > >
> > > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?
> >
> > Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets.
> >
> > With MSG_MORE on the first N-1 calls, the stack will emit a single
> > packet with larger size.
> >
> > UDP application may relay on packet size for protocol semantic. i.e. the
> > application level message size could be expected to be equal to the
> > (wire) packet size itself.
>
> Correct, but the function is __skb_send_sock() - so you'd expect it to
> send the 'message' held in the skb to the socket.
> I don't think that the fact that the skb has fragments should make any
> difference to what is sent.
> In other words it ought to be valid for any code to 'linearize' the skb.
>
> David
Okay, thank you for your explanations.
> >
> > Unexpectedly aggregating the packets may break the application. Also it
> > can lead to IP fragmentation, which in turn could kill performances.
> >
> > > If that's not feasible, would the v2 version of the code work for UDP?
> >
> > My ask is to explicitly avoid MSG_MORE when the transport is UDP.
> >
> > /P
> >
So do I need to resend the v2 version again (https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/),
or is this version also inapplicable in some cases?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
2025-07-07 6:17 ` Feng Yang
@ 2025-07-07 7:19 ` Eric Dumazet
0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2025-07-07 7:19 UTC (permalink / raw)
To: Feng Yang
Cc: pabeni, aleksander.lobakin, almasrymina, asml.silence, davem,
david.laight.linux, ebiggers, horms, kerneljasonxing, kuba,
linux-kernel, netdev, stfomichev, willemb, yangfeng
On Sun, Jul 6, 2025 at 11:17 PM Feng Yang <yangfeng59949@163.com> wrote:
>
> So do I need to resend the v2 version again (https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/),
> or is this version also inapplicable in some cases?
Or a V3 perhaps, limiting MSG_MORE hint to TCP sockets where it is
definitely safe.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d6420b74ea9c6a9c53a7c16634cce82a1cd1bbd3..dc440252a68e5e7bb0588ab230fbc5b7a656e220
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3235,6 +3235,7 @@ typedef int (*sendmsg_func)(struct sock *sk,
struct msghdr *msg);
static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
int len, sendmsg_func sendmsg, int flags)
{
+ int more_hint = sk_is_tcp(sk) ? MSG_MORE : 0;
unsigned int orig_len = len;
struct sk_buff *head = skb;
unsigned short fragidx;
@@ -3252,7 +3253,8 @@ static int __skb_send_sock(struct sock *sk,
struct sk_buff *skb, int offset,
kv.iov_len = slen;
memset(&msg, 0, sizeof(msg));
msg.msg_flags = MSG_DONTWAIT | flags;
-
+ if (slen < len)
+ msg.msg_flags |= more_hint;
iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
sendmsg_unlocked, sk, &msg);
@@ -3292,6 +3294,8 @@ static int __skb_send_sock(struct sock *sk,
struct sk_buff *skb, int offset,
flags,
};
+ if (slen < len)
+ msg.msg_flags |= more_hint;
bvec_set_page(&bvec, skb_frag_page(frag), slen,
skb_frag_off(frag) + offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-07 7:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30 7:10 [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission Feng Yang
2025-07-03 8:48 ` Paolo Abeni
2025-07-03 11:44 ` David Laight
2025-07-04 9:26 ` Feng Yang
2025-07-04 15:50 ` Paolo Abeni
2025-07-05 7:16 ` David Laight
2025-07-07 6:17 ` Feng Yang
2025-07-07 7:19 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).