Re: [PATCH mptcp-next] mptcp: record subflows in RPS table

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matthieu Baerts <matttbe@kernel.org>
To: Christoph Paasch <cpaasch@openai.com>
Cc: MPTCP Upstream <mptcp@lists.linux.dev>,
	Geliang Tang <geliang@kernel.org>,
	Mat Martineau <martineau@kernel.org>
Subject: Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Date: Thu, 14 Aug 2025 13:47:14 +0200	[thread overview]
Message-ID: <3decc4cd-bbf1-45f1-9985-d2b795c5bfcc@kernel.org> (raw)
In-Reply-To: <4d933d54-fff1-4a9d-9601-753b8cddf934@kernel.org>

Hi Christoph,

On 14/08/2025 13:20, Matthieu Baerts wrote:
> Hi Christoph,
> 
> On 14/08/2025 05:31, Christoph Paasch wrote:
>> On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>>>
>>> Hi Christoph,
>>>
>>> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote:
>>>> From: Christoph Paasch <cpaasch@openai.com>
>>>>
>>>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording
>>>> their RX flow hash into the rps_sock_flow_table so that incoming packets
>>>> are steered to the CPU where the application runs.
>>>>
>>>> With MPTCP, the application interacts with the parent MPTCP socket while
>>>> data is carried over per-subflow TCP sockets. Without recording these
>>>> subflows, aRFS cannot steer interrupts and RX processing for the flows
>>>> to the desired CPU.
>>>>
>>>> Record all subflows in the RPS table by calling sock_rps_record_flow()
>>>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().
>>>>
>>>> It does not by itself improve throughput, but ensures that IRQ and RX
>>>> processing are directed to the right CPU, which is a
>>>> prerequisite for effective aRFS.
>>>
>>> Thank you for looking at that!
>>>
>>>> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
>>>> ---
>>>>  net/mptcp/protocol.c | 16 ++++++++++++++++
>>>>  1 file changed, 16 insertions(+)
>>>>
>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
>>>> --- a/net/mptcp/protocol.c
>>>> +++ b/net/mptcp/protocol.c
>>>> @@ -12,6 +12,7 @@
>>>>  #include <linux/sched/signal.h>
>>>>  #include <linux/atomic.h>
>>>>  #include <net/aligned_data.h>
>>>> +#include <net/rps.h>
>>>>  #include <net/sock.h>
>>>>  #include <net/inet_common.h>
>>>>  #include <net/inet_hashtables.h>
>>>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
>>>>       return limit - not_sent;
>>>>  }
>>>>
>>>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
>>>> +{
>>>> +     struct mptcp_subflow_context *subflow;
>>>> +
>>>> +     mptcp_for_each_subflow(msk, subflow) {
>>>> +             struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>>> +
>>>> +             sock_rps_record_flow(ssk);
>>>
>>> Should we eventually open code this RPS function and directly call
>>> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through
>>> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed
>>> before the for_each, then check the state for each subflow, no?
>>>
>>> Or maybe such optimisation is not worth it?
>>
>> IMO it's not worth it. Because it would also mean that we pull in more
>> of non-MPTCP things over to MPTCP files.
> 
> Good point for the maintenance!
> 
> But then, what about adding a small helper in include/net/rps.h? Here we
> could have this at the top of mptcp_rps_record_subflows():
> 
>   if (!sock_rps_record_needed())   /* with likely()? */
>       return;
> 
> I'm suggesting this because rfs is disabled by default, so it sounds
> better to check that before iterating over different subflows for
> nothing, no?
> 
>>> While at it, I see that sock_rps_record_flow() is called in
>>> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk.
>>> I guess that's OK, right?
>>
>> AFAICS, that should be alright. sk_rxhash should not be set on an msk.
>> Even on the passive opener side, we clone msk from the ssk, but at
>> that point sk_rxhash is not set yet. (I may be wrong of course, I'm
>> still not super familiar with the source-code...)
> 
> Thank you for having looked! I guess you would have seen issues when
> testing if there was a problem with that, so no need to investigate more.
> 
>>> Also, is sock_rps_record_flow() called when a subflow is "accept()ed",
>>> or is there something else to do there for this subflow to copy TCP's
>>> behaviour?
>>
>> For new subflows, I don't think we can easily do that. Because
>> sock_rps_record_flow() records the CPU on which it is running. From
>> what I see, when new subflows get accepted, that is not happening in
>> the application's context necessarily. So, we have to wait for a
>> recvmsg/sendmsg.
> 
> Ah yes, of course, we should not do it there for additional subflows!
> And for the first subflow, I think that it will be done because when an
> accept() is triggered from the userspace, mptcp_stream_accept(msk, arg)
> will call inet_csk_accept(ssk, arg) that will end up doing that I think.

My bad, I think __inet_accept will not be called from inet_csk_accept().
For TCP, I think the path is:

  inet_accept()
    --> inet_csk_accept()
    --> __inet_accept()

But with MPTCP, mptcp_stream_accept is called instead of inet_accept(),
and I don't think __inet_accept() will be called on a subflow. (I didn't
validate that.) Then, maybe we should call sock_rps_record_flow() on the
newsk in mptcp_stream_accept, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

next prev parent reply	other threads:[~2025-08-14 11:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07 18:14 [PATCH mptcp-next] mptcp: record subflows in RPS table Christoph Paasch
2025-08-07 18:14 ` Christoph Paasch via B4 Relay
2025-08-07 19:46 ` MPTCP CI
2025-08-13 14:53 ` Matthieu Baerts
2025-08-14  3:31   ` Christoph Paasch
2025-08-14 11:20     ` Matthieu Baerts
2025-08-14 11:47       ` Matthieu Baerts [this message]
2025-08-26  4:10         ` Christoph Paasch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3decc4cd-bbf1-45f1-9985-d2b795c5bfcc@kernel.org \
    --to=matttbe@kernel.org \
    --cc=cpaasch@openai.com \
    --cc=geliang@kernel.org \
    --cc=martineau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.