public inbox for mptcp@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
@ 2026-03-09  2:54 Gang Yan
  2026-03-09  4:07 ` MPTCP CI
  2026-03-26 16:42 ` Matthieu Baerts
  0 siblings, 2 replies; 6+ messages in thread
From: Gang Yan @ 2026-03-09  2:54 UTC (permalink / raw)
  To: mptcp; +Cc: pabeni, geliang, Gang Yan

From: Gang Yan <yangang@kylinos.cn>

Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
which marks the end of a record for application-level message boundaries.

Data fragments tagged with MSG_EOR are explicitly marked in the
mptcp_data_frag structure and skb context to prevent unintended
coalescing with subsequent data chunks. This ensures the intent of
applications using MSG_EOR is preserved across MPTCP subflows,
maintaining consistent message segmentation behavior.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---

Notes:
      - This patch incorporates feedback and suggestions from Paolo Abeni
        and Geliang Tang, including memory alignment optimizations for the
        mptcp_data_frag struct (shrinking overhead to u8 and using bitfield
        for eor to avoid size increase) and compile-time checks with BUILD_BUG_ON.
      - Packetdrill test cases validating this feature are available at:
        https://github.com/multipath-tcp/packetdrill/pull/189/changes/d6ce92a4786704fe749bbd848ced0c047632282e

 net/mptcp/protocol.c | 24 ++++++++++++++++++++++--
 net/mptcp/protocol.h |  4 +++-
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 17e43aff4459..3e574c87301b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1174,6 +1174,7 @@ mptcp_carve_data_frag(const struct mptcp_sock *msk, struct page_frag *pfrag,
 	dfrag->offset = offset + sizeof(struct mptcp_data_frag);
 	dfrag->already_sent = 0;
 	dfrag->page = pfrag->page;
+	dfrag->eor = 0;
 
 	return dfrag;
 }
@@ -1435,6 +1436,13 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 		mptcp_update_infinite_map(msk, ssk, mpext);
 	trace_mptcp_sendmsg_frag(mpext);
 	mptcp_subflow_ctx(ssk)->rel_write_seq += copy;
+
+	/* If this is the last chunk of a dfrag with MSG_EOR set,
+	 * mark the skb to prevent coalescing with subsequent data.
+	 */
+	if (dfrag->eor && info->sent + copy >= dfrag->data_len)
+		TCP_SKB_CB(skb)->eor = 1;
+
 	return copy;
 }
 
@@ -1895,7 +1903,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	long timeo;
 
 	/* silently ignore everything else */
-	msg->msg_flags &= MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_FASTOPEN;
+	msg->msg_flags &= MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+			  MSG_FASTOPEN | MSG_EOR;
 
 	lock_sock(sk);
 
@@ -2002,8 +2011,16 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			goto do_error;
 	}
 
-	if (copied)
+	if (copied) {
+		/* Mark the last dfrag with EOR if MSG_EOR was set */
+		if (msg->msg_flags & MSG_EOR) {
+			struct mptcp_data_frag *dfrag = mptcp_pending_tail(sk);
+
+			if (dfrag)
+				dfrag->eor = 1;
+		}
 		__mptcp_push_pending(sk, msg->msg_flags);
+	}
 
 out:
 	release_sock(sk);
@@ -4621,6 +4638,9 @@ void __init mptcp_proto_init(void)
 	inet_register_protosw(&mptcp_protosw);
 
 	BUILD_BUG_ON(sizeof(struct mptcp_skb_cb) > sizeof_field(struct sk_buff, cb));
+	/* Compile-time check: ensure 'overhead' (alignment + struct size) fits in u8 */
+	BUILD_BUG_ON(ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag) > U8_MAX);
+
 }
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f5d4d7d030f2..db96f2945cbd 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -264,7 +264,9 @@ struct mptcp_data_frag {
 	u64 data_seq;
 	u16 data_len;
 	u16 offset;
-	u16 overhead;
+	u8 overhead;
+	u8 eor:1,
+	   __unused:7;
 	u16 already_sent;
 	struct page *page;
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
  2026-03-09  2:54 [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path Gang Yan
@ 2026-03-09  4:07 ` MPTCP CI
  2026-03-26 16:42 ` Matthieu Baerts
  1 sibling, 0 replies; 6+ messages in thread
From: MPTCP CI @ 2026-03-09  4:07 UTC (permalink / raw)
  To: Gang Yan; +Cc: mptcp

Hi Gang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_dss 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22836823300

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/070dbf41676b
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1063383


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
  2026-03-09  2:54 [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path Gang Yan
  2026-03-09  4:07 ` MPTCP CI
@ 2026-03-26 16:42 ` Matthieu Baerts
  2026-03-30  8:19   ` gang.yan
  1 sibling, 1 reply; 6+ messages in thread
From: Matthieu Baerts @ 2026-03-26 16:42 UTC (permalink / raw)
  To: Gang Yan, mptcp; +Cc: pabeni, geliang, Gang Yan

Hi Gang,

Thank you for the new version.

On 09/03/2026 03:54, Gang Yan wrote:
> From: Gang Yan <yangang@kylinos.cn>
> 
> Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
> which marks the end of a record for application-level message boundaries.
> 
> Data fragments tagged with MSG_EOR are explicitly marked in the
> mptcp_data_frag structure and skb context to prevent unintended
> coalescing with subsequent data chunks. This ensures the intent of
> applications using MSG_EOR is preserved across MPTCP subflows,
> maintaining consistent message segmentation behavior.
> 
> Signed-off-by: Gang Yan <yangang@kylinos.cn>
> ---
> 
> Notes:
>       - This patch incorporates feedback and suggestions from Paolo Abeni
>         and Geliang Tang, including memory alignment optimizations for the
>         mptcp_data_frag struct (shrinking overhead to u8 and using bitfield
>         for eor to avoid size increase) and compile-time checks with BUILD_BUG_ON.

Please mention why you shrank "overhead" to a u8 (not to increase the
struct size), and why it is OK to do so (u16 not needed because ...) +
explaining the BUILD_BUG_ON().

>       - Packetdrill test cases validating this feature are available at:
>         https://github.com/multipath-tcp/packetdrill/pull/189/changes/d6ce92a4786704fe749bbd848ced0c047632282e

Thank you, I just reviewed it.

Do you mind checking the AI review there please:


https://netdev-ai.bots.linux.dev/ai-review.html?id=22434689-7326-48c8-af75-273d99fbef55

I think it is valid, but better to double-check.

> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 17e43aff4459..3e574c87301b 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c

(...)

> @@ -4621,6 +4638,9 @@ void __init mptcp_proto_init(void)
>  	inet_register_protosw(&mptcp_protosw);
>  
>  	BUILD_BUG_ON(sizeof(struct mptcp_skb_cb) > sizeof_field(struct sk_buff, cb));
> +	/* Compile-time check: ensure 'overhead' (alignment + struct size) fits in u8 */
> +	BUILD_BUG_ON(ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag) > U8_MAX);

Sorry, I'm not sure what you are checking here. Do you mind explaining
it please?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
  2026-03-26 16:42 ` Matthieu Baerts
@ 2026-03-30  8:19   ` gang.yan
  2026-03-30  9:50     ` Matthieu Baerts
  0 siblings, 1 reply; 6+ messages in thread
From: gang.yan @ 2026-03-30  8:19 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

March 27, 2026 at 12:42 AM, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:


> 
> Hi Gang,
> 
> Thank you for the new version.
> 
> On 09/03/2026 03:54, Gang Yan wrote:
> 
> > 
> > From: Gang Yan <yangang@kylinos.cn>
> >  
> >  Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
> >  which marks the end of a record for application-level message boundaries.
> >  
> >  Data fragments tagged with MSG_EOR are explicitly marked in the
> >  mptcp_data_frag structure and skb context to prevent unintended
> >  coalescing with subsequent data chunks. This ensures the intent of
> >  applications using MSG_EOR is preserved across MPTCP subflows,
> >  maintaining consistent message segmentation behavior.
> >  
> >  Signed-off-by: Gang Yan <yangang@kylinos.cn>
> >  ---
> >  
> >  Notes:
> >  - This patch incorporates feedback and suggestions from Paolo Abeni
> >  and Geliang Tang, including memory alignment optimizations for the
> >  mptcp_data_frag struct (shrinking overhead to u8 and using bitfield
> >  for eor to avoid size increase) and compile-time checks with BUILD_BUG_ON.
> > 
> Please mention why you shrank "overhead" to a u8 (not to increase the
> struct size), and why it is OK to do so (u16 not needed because ...) +
> explaining the BUILD_BUG_ON().

The ‘u8’ is one of Paolo's suggestions[1]. I think 'u16' is not needed because:
 - 'offset = ALIGN(orig_offset, sizeof(long));'
 - 'dfrag->offset = offset - origin_offset + sizeof(struct mptcp_data_frag);',
the max value of offset is 7, and sizeof(struct mptcp_data_frag)) is
usually 40, so the overhead is 47, far less than 255.

Another suggestion from Paolo[1] is a build time check on the max 'overhead'
value. So I use 'ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag)' to
represent the max_val of 'overhead'.

But Paolo also mention it's probably too conservative. WDYT?

[1] https://patchwork.kernel.org/project/mptcp/patch/20260203023029.855434-1-gang.yan@linux.dev/

> 
> > 
> > - Packetdrill test cases validating this feature are available at:
> >  https://github.com/multipath-tcp/packetdrill/pull/189/changes/d6ce92a4786704fe749bbd848ced0c047632282e
> > 
> Thank you, I just reviewed it.

Thanks, I'll try to fix them.

> 
> Do you mind checking the AI review there please:
> 
> https://netdev-ai.bots.linux.dev/ai-review.html?id=22434689-7326-48c8-af75-273d99fbef55
> 
> I think it is valid, but better to double-check.

Yes, I think it's a good catch, and we should fix it as follows:

@@ -1032,7 +1032,8 @@ static bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk,
                                       const struct page_frag *pfrag,
                                       const struct mptcp_data_frag *df)
 {
-       return df && pfrag->page == df->page &&
+       return df && !df->eor &&
+               pfrag->page == df->page &&
                pfrag->size - pfrag->offset > 0 &&
                pfrag->offset == (df->offset + df->data_len) &&
                df->data_seq + df->data_len == msk->write_seq;
If OK, I'll apply it when sending v2.

> 
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> >  index 17e43aff4459..3e574c87301b 100644
> >  --- a/net/mptcp/protocol.c
> >  +++ b/net/mptcp/protocol.c
> > 
> (...)
> 
> > 
> > @@ -4621,6 +4638,9 @@ void __init mptcp_proto_init(void)
> >  inet_register_protosw(&mptcp_protosw);
> >  
> >  BUILD_BUG_ON(sizeof(struct mptcp_skb_cb) > sizeof_field(struct sk_buff, cb));
> >  + /* Compile-time check: ensure 'overhead' (alignment + struct size) fits in u8 */
> >  + BUILD_BUG_ON(ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag) > U8_MAX);
> > 
> Sorry, I'm not sure what you are checking here. Do you mind explaining
> it please?
> 

The 'BUILD_BUG_ON' is explained at the beginning of the reply, thanks.

Cheers,
Gang

> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
  2026-03-30  8:19   ` gang.yan
@ 2026-03-30  9:50     ` Matthieu Baerts
  2026-03-31  6:54       ` gang.yan
  0 siblings, 1 reply; 6+ messages in thread
From: Matthieu Baerts @ 2026-03-30  9:50 UTC (permalink / raw)
  To: gang.yan, mptcp

Hi Gang,

On 30/03/2026 10:19, gang.yan@linux.dev wrote:
> March 27, 2026 at 12:42 AM, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:
> 
> 
>>
>> Hi Gang,
>>
>> Thank you for the new version.
>>
>> On 09/03/2026 03:54, Gang Yan wrote:
>>
>>>
>>> From: Gang Yan <yangang@kylinos.cn>
>>>  
>>>  Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
>>>  which marks the end of a record for application-level message boundaries.
>>>  
>>>  Data fragments tagged with MSG_EOR are explicitly marked in the
>>>  mptcp_data_frag structure and skb context to prevent unintended
>>>  coalescing with subsequent data chunks. This ensures the intent of
>>>  applications using MSG_EOR is preserved across MPTCP subflows,
>>>  maintaining consistent message segmentation behavior.
>>>  
>>>  Signed-off-by: Gang Yan <yangang@kylinos.cn>
>>>  ---
>>>  
>>>  Notes:
>>>  - This patch incorporates feedback and suggestions from Paolo Abeni
>>>  and Geliang Tang, including memory alignment optimizations for the
>>>  mptcp_data_frag struct (shrinking overhead to u8 and using bitfield
>>>  for eor to avoid size increase) and compile-time checks with BUILD_BUG_ON.
>>>
>> Please mention why you shrank "overhead" to a u8 (not to increase the
>> struct size), and why it is OK to do so (u16 not needed because ...) +
>> explaining the BUILD_BUG_ON().
> 
> The ‘u8’ is one of Paolo's suggestions[1]. I think 'u16' is not needed because:
>  - 'offset = ALIGN(orig_offset, sizeof(long));'
>  - 'dfrag->offset = offset - origin_offset + sizeof(struct mptcp_data_frag);',
> the max value of offset is 7, and sizeof(struct mptcp_data_frag)) is
> usually 40, so the overhead is 47, far less than 255.

Thank you for the explanation. Can you then mention in the commit
message that it is fine to reduce overhead to a 'u8', and add the above
explanation, please?

If 'offset' max value is 7, it could also be reduced from a u16 to a u8
then, no?

> Another suggestion from Paolo[1] is a build time check on the max 'overhead'
> value. So I use 'ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag)' to
> represent the max_val of 'overhead'.

It might be good to add a comment here too, at least to explain that
"ALIGN(1, sizeof(long))" represents 'offset' maximum size.

> But Paolo also mention it's probably too conservative. WDYT?

Maybe, but it doesn't hurt I suppose. As long as this check is clearly
linked to different fields from the mptcp_data_frag structure → having a
comment explaining that.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
  2026-03-30  9:50     ` Matthieu Baerts
@ 2026-03-31  6:54       ` gang.yan
  0 siblings, 0 replies; 6+ messages in thread
From: gang.yan @ 2026-03-31  6:54 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

March 30, 2026 at 5:50 PM, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:


> 
> Hi Gang,
> 
> On 30/03/2026 10:19, gang.yan@linux.dev wrote:
> 
> > 
> > March 27, 2026 at 12:42 AM, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:
> >  
> >  
> > 
> > > 
> > > Hi Gang,
> > > 
> > >  Thank you for the new version.
> > > 
> > >  On 09/03/2026 03:54, Gang Yan wrote:
> > > 
> >  From: Gang Yan <yangang@kylinos.cn>
> >  
> >  Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
> >  which marks the end of a record for application-level message boundaries.
> >  
> >  Data fragments tagged with MSG_EOR are explicitly marked in the
> >  mptcp_data_frag structure and skb context to prevent unintended
> >  coalescing with subsequent data chunks. This ensures the intent of
> >  applications using MSG_EOR is preserved across MPTCP subflows,
> >  maintaining consistent message segmentation behavior.
> >  
> >  Signed-off-by: Gang Yan <yangang@kylinos.cn>
> >  ---
> >  
> >  Notes:
> >  - This patch incorporates feedback and suggestions from Paolo Abeni
> >  and Geliang Tang, including memory alignment optimizations for the
> >  mptcp_data_frag struct (shrinking overhead to u8 and using bitfield
> >  for eor to avoid size increase) and compile-time checks with BUILD_BUG_ON.
> > 
> > > 
> > > Please mention why you shrank "overhead" to a u8 (not to increase the
> > >  struct size), and why it is OK to do so (u16 not needed because ...) +
> > >  explaining the BUILD_BUG_ON().
> > > 
> >  
> >  The ‘u8’ is one of Paolo's suggestions[1]. I think 'u16' is not needed because:
> >  - 'offset = ALIGN(orig_offset, sizeof(long));'
> >  - 'dfrag->offset = offset - origin_offset + sizeof(struct mptcp_data_frag);',
> >  the max value of offset is 7, and sizeof(struct mptcp_data_frag)) is
> >  usually 40, so the overhead is 47, far less than 255.
> > 
> Thank you for the explanation. Can you then mention in the commit
> message that it is fine to reduce overhead to a 'u8', and add the above
> explanation, please?
> 
> If 'offset' max value is 7, it could also be reduced from a u16 to a u8
> then, no?

Hi, Matt:

Sorry, there was an error in the explanation. The maximum value of
(offset - origin_offset) is 7, so the 'offset' variable should use u16.

> 
> > 
> > Another suggestion from Paolo[1] is a build time check on the max 'overhead'
> >  value. So I use 'ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag)' to
> >  represent the max_val of 'overhead'.
> > 
> It might be good to add a comment here too, at least to explain that
> "ALIGN(1, sizeof(long))" represents 'offset' maximum size.

Good idea, I'll apply your suggestions in v2.

Thanks
Gang
> 
> > 
> > But Paolo also mention it's probably too conservative. WDYT?
> > 
> Maybe, but it doesn't hurt I suppose. As long as this check is clearly
> linked to different fields from the mptcp_data_frag structure → having a
> comment explaining that.
> 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-31  6:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09  2:54 [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path Gang Yan
2026-03-09  4:07 ` MPTCP CI
2026-03-26 16:42 ` Matthieu Baerts
2026-03-30  8:19   ` gang.yan
2026-03-30  9:50     ` Matthieu Baerts
2026-03-31  6:54       ` gang.yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox