* [PATCH mptcp-next] mptcp: record subflows in RPS table @ 2025-08-07 18:14 ` Christoph Paasch via B4 Relay 0 siblings, 0 replies; 8+ messages in thread From: Christoph Paasch @ 2025-08-07 18:14 UTC (permalink / raw) To: MPTCP Upstream, Matthieu Baerts, Mat Martineau, Geliang Tang Cc: Christoph Paasch Accelerated Receive Flow Steering (aRFS) relies on sockets recording their RX flow hash into the rps_sock_flow_table so that incoming packets are steered to the CPU where the application runs. With MPTCP, the application interacts with the parent MPTCP socket while data is carried over per-subflow TCP sockets. Without recording these subflows, aRFS cannot steer interrupts and RX processing for the flows to the desired CPU. Record all subflows in the RPS table by calling sock_rps_record_flow() for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). It does not by itself improve throughput, but ensures that IRQ and RX processing are directed to the right CPU, which is a prerequisite for effective aRFS. Signed-off-by: Christoph Paasch <cpaasch@openai.com> --- net/mptcp/protocol.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -12,6 +12,7 @@ #include <linux/sched/signal.h> #include <linux/atomic.h> #include <net/aligned_data.h> +#include <net/rps.h> #include <net/sock.h> #include <net/inet_common.h> #include <net/inet_hashtables.h> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) return limit - not_sent; } +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + sock_rps_record_flow(ssk); + } +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -1753,6 +1765,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) lock_sock(sk); + mptcp_rps_record_subflows(msk); + if (unlikely(inet_test_bit(DEFER_CONNECT, sk) || msg->msg_flags & MSG_FASTOPEN)) { int copied_syn = 0; @@ -2131,6 +2145,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, goto out_err; } + mptcp_rps_record_subflows(msk); + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); len = min_t(size_t, len, INT_MAX); --- base-commit: f756e5bb29fa59378c111a2401fc64e44cd53442 change-id: 20250731-b4-mptcp_perf-02912dd4f65b Best regards, -- Christoph Paasch <cpaasch@openai.com> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH mptcp-next] mptcp: record subflows in RPS table @ 2025-08-07 18:14 ` Christoph Paasch via B4 Relay 0 siblings, 0 replies; 8+ messages in thread From: Christoph Paasch via B4 Relay @ 2025-08-07 18:14 UTC (permalink / raw) To: MPTCP Upstream, Matthieu Baerts, Mat Martineau, Geliang Tang Cc: Christoph Paasch From: Christoph Paasch <cpaasch@openai.com> Accelerated Receive Flow Steering (aRFS) relies on sockets recording their RX flow hash into the rps_sock_flow_table so that incoming packets are steered to the CPU where the application runs. With MPTCP, the application interacts with the parent MPTCP socket while data is carried over per-subflow TCP sockets. Without recording these subflows, aRFS cannot steer interrupts and RX processing for the flows to the desired CPU. Record all subflows in the RPS table by calling sock_rps_record_flow() for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). It does not by itself improve throughput, but ensures that IRQ and RX processing are directed to the right CPU, which is a prerequisite for effective aRFS. Signed-off-by: Christoph Paasch <cpaasch@openai.com> --- net/mptcp/protocol.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -12,6 +12,7 @@ #include <linux/sched/signal.h> #include <linux/atomic.h> #include <net/aligned_data.h> +#include <net/rps.h> #include <net/sock.h> #include <net/inet_common.h> #include <net/inet_hashtables.h> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) return limit - not_sent; } +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + sock_rps_record_flow(ssk); + } +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -1753,6 +1765,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) lock_sock(sk); + mptcp_rps_record_subflows(msk); + if (unlikely(inet_test_bit(DEFER_CONNECT, sk) || msg->msg_flags & MSG_FASTOPEN)) { int copied_syn = 0; @@ -2131,6 +2145,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, goto out_err; } + mptcp_rps_record_subflows(msk); + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); len = min_t(size_t, len, INT_MAX); --- base-commit: f756e5bb29fa59378c111a2401fc64e44cd53442 change-id: 20250731-b4-mptcp_perf-02912dd4f65b Best regards, -- Christoph Paasch <cpaasch@openai.com> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next] mptcp: record subflows in RPS table 2025-08-07 18:14 ` Christoph Paasch via B4 Relay (?) @ 2025-08-07 19:46 ` MPTCP CI -1 siblings, 0 replies; 8+ messages in thread From: MPTCP CI @ 2025-08-07 19:46 UTC (permalink / raw) To: Christoph Paasch; +Cc: mptcp Hi Christoph, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf-normal (only bpftest_all): Success! ✅ - KVM Validation: btf-debug (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16812537653 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8452b3707158 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=989208 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next] mptcp: record subflows in RPS table 2025-08-07 18:14 ` Christoph Paasch via B4 Relay (?) (?) @ 2025-08-13 14:53 ` Matthieu Baerts 2025-08-14 3:31 ` Christoph Paasch -1 siblings, 1 reply; 8+ messages in thread From: Matthieu Baerts @ 2025-08-13 14:53 UTC (permalink / raw) To: cpaasch; +Cc: MPTCP Upstream, Geliang Tang, Mat Martineau Hi Christoph, On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote: > From: Christoph Paasch <cpaasch@openai.com> > > Accelerated Receive Flow Steering (aRFS) relies on sockets recording > their RX flow hash into the rps_sock_flow_table so that incoming packets > are steered to the CPU where the application runs. > > With MPTCP, the application interacts with the parent MPTCP socket while > data is carried over per-subflow TCP sockets. Without recording these > subflows, aRFS cannot steer interrupts and RX processing for the flows > to the desired CPU. > > Record all subflows in the RPS table by calling sock_rps_record_flow() > for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). > > It does not by itself improve throughput, but ensures that IRQ and RX > processing are directed to the right CPU, which is a > prerequisite for effective aRFS. Thank you for looking at that! > Signed-off-by: Christoph Paasch <cpaasch@openai.com> > --- > net/mptcp/protocol.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -12,6 +12,7 @@ > #include <linux/sched/signal.h> > #include <linux/atomic.h> > #include <net/aligned_data.h> > +#include <net/rps.h> > #include <net/sock.h> > #include <net/inet_common.h> > #include <net/inet_hashtables.h> > @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) > return limit - not_sent; > } > > +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) > +{ > + struct mptcp_subflow_context *subflow; > + > + mptcp_for_each_subflow(msk, subflow) { > + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); > + > + sock_rps_record_flow(ssk); Should we eventually open code this RPS function and directly call sock_rps_record_flow_hash()? If RPS is not needed, no need to go through the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed before the for_each, then check the state for each subflow, no? Or maybe such optimisation is not worth it? While at it, I see that sock_rps_record_flow() is called in __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk. I guess that's OK, right? Also, is sock_rps_record_flow() called when a subflow is "accept()ed", or is there something else to do there for this subflow to copy TCP's behaviour? Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next] mptcp: record subflows in RPS table 2025-08-13 14:53 ` Matthieu Baerts @ 2025-08-14 3:31 ` Christoph Paasch 2025-08-14 11:20 ` Matthieu Baerts 0 siblings, 1 reply; 8+ messages in thread From: Christoph Paasch @ 2025-08-14 3:31 UTC (permalink / raw) To: Matthieu Baerts; +Cc: MPTCP Upstream, Geliang Tang, Mat Martineau On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote: > > Hi Christoph, > > On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote: > > From: Christoph Paasch <cpaasch@openai.com> > > > > Accelerated Receive Flow Steering (aRFS) relies on sockets recording > > their RX flow hash into the rps_sock_flow_table so that incoming packets > > are steered to the CPU where the application runs. > > > > With MPTCP, the application interacts with the parent MPTCP socket while > > data is carried over per-subflow TCP sockets. Without recording these > > subflows, aRFS cannot steer interrupts and RX processing for the flows > > to the desired CPU. > > > > Record all subflows in the RPS table by calling sock_rps_record_flow() > > for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). > > > > It does not by itself improve throughput, but ensures that IRQ and RX > > processing are directed to the right CPU, which is a > > prerequisite for effective aRFS. > > Thank you for looking at that! > > > Signed-off-by: Christoph Paasch <cpaasch@openai.com> > > --- > > net/mptcp/protocol.c | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 > > --- a/net/mptcp/protocol.c > > +++ b/net/mptcp/protocol.c > > @@ -12,6 +12,7 @@ > > #include <linux/sched/signal.h> > > #include <linux/atomic.h> > > #include <net/aligned_data.h> > > +#include <net/rps.h> > > #include <net/sock.h> > > #include <net/inet_common.h> > > #include <net/inet_hashtables.h> > > @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) > > return limit - not_sent; > > } > > > > +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) > > +{ > > + struct mptcp_subflow_context *subflow; > > + > > + mptcp_for_each_subflow(msk, subflow) { > > + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); > > + > > + sock_rps_record_flow(ssk); > > Should we eventually open code this RPS function and directly call > sock_rps_record_flow_hash()? If RPS is not needed, no need to go through > the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed > before the for_each, then check the state for each subflow, no? > > Or maybe such optimisation is not worth it? IMO it's not worth it. Because it would also mean that we pull in more of non-MPTCP things over to MPTCP files. > While at it, I see that sock_rps_record_flow() is called in > __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk. > I guess that's OK, right? AFAICS, that should be alright. sk_rxhash should not be set on an msk. Even on the passive opener side, we clone msk from the ssk, but at that point sk_rxhash is not set yet. (I may be wrong of course, I'm still not super familiar with the source-code...) > Also, is sock_rps_record_flow() called when a subflow is "accept()ed", > or is there something else to do there for this subflow to copy TCP's > behaviour? For new subflows, I don't think we can easily do that. Because sock_rps_record_flow() records the CPU on which it is running. From what I see, when new subflows get accepted, that is not happening in the application's context necessarily. So, we have to wait for a recvmsg/sendmsg. Christoph > > Cheers, > Matt > -- > Sponsored by the NGI0 Core fund. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next] mptcp: record subflows in RPS table 2025-08-14 3:31 ` Christoph Paasch @ 2025-08-14 11:20 ` Matthieu Baerts 2025-08-14 11:47 ` Matthieu Baerts 0 siblings, 1 reply; 8+ messages in thread From: Matthieu Baerts @ 2025-08-14 11:20 UTC (permalink / raw) To: Christoph Paasch; +Cc: MPTCP Upstream, Geliang Tang, Mat Martineau Hi Christoph, On 14/08/2025 05:31, Christoph Paasch wrote: > On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote: >> >> Hi Christoph, >> >> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote: >>> From: Christoph Paasch <cpaasch@openai.com> >>> >>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording >>> their RX flow hash into the rps_sock_flow_table so that incoming packets >>> are steered to the CPU where the application runs. >>> >>> With MPTCP, the application interacts with the parent MPTCP socket while >>> data is carried over per-subflow TCP sockets. Without recording these >>> subflows, aRFS cannot steer interrupts and RX processing for the flows >>> to the desired CPU. >>> >>> Record all subflows in the RPS table by calling sock_rps_record_flow() >>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). >>> >>> It does not by itself improve throughput, but ensures that IRQ and RX >>> processing are directed to the right CPU, which is a >>> prerequisite for effective aRFS. >> >> Thank you for looking at that! >> >>> Signed-off-by: Christoph Paasch <cpaasch@openai.com> >>> --- >>> net/mptcp/protocol.c | 16 ++++++++++++++++ >>> 1 file changed, 16 insertions(+) >>> >>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 >>> --- a/net/mptcp/protocol.c >>> +++ b/net/mptcp/protocol.c >>> @@ -12,6 +12,7 @@ >>> #include <linux/sched/signal.h> >>> #include <linux/atomic.h> >>> #include <net/aligned_data.h> >>> +#include <net/rps.h> >>> #include <net/sock.h> >>> #include <net/inet_common.h> >>> #include <net/inet_hashtables.h> >>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) >>> return limit - not_sent; >>> } >>> >>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) >>> +{ >>> + struct mptcp_subflow_context *subflow; >>> + >>> + mptcp_for_each_subflow(msk, subflow) { >>> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); >>> + >>> + sock_rps_record_flow(ssk); >> >> Should we eventually open code this RPS function and directly call >> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through >> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed >> before the for_each, then check the state for each subflow, no? >> >> Or maybe such optimisation is not worth it? > > IMO it's not worth it. Because it would also mean that we pull in more > of non-MPTCP things over to MPTCP files. Good point for the maintenance! But then, what about adding a small helper in include/net/rps.h? Here we could have this at the top of mptcp_rps_record_subflows(): if (!sock_rps_record_needed()) /* with likely()? */ return; I'm suggesting this because rfs is disabled by default, so it sounds better to check that before iterating over different subflows for nothing, no? >> While at it, I see that sock_rps_record_flow() is called in >> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk. >> I guess that's OK, right? > > AFAICS, that should be alright. sk_rxhash should not be set on an msk. > Even on the passive opener side, we clone msk from the ssk, but at > that point sk_rxhash is not set yet. (I may be wrong of course, I'm > still not super familiar with the source-code...) Thank you for having looked! I guess you would have seen issues when testing if there was a problem with that, so no need to investigate more. >> Also, is sock_rps_record_flow() called when a subflow is "accept()ed", >> or is there something else to do there for this subflow to copy TCP's >> behaviour? > > For new subflows, I don't think we can easily do that. Because > sock_rps_record_flow() records the CPU on which it is running. From > what I see, when new subflows get accepted, that is not happening in > the application's context necessarily. So, we have to wait for a > recvmsg/sendmsg. Ah yes, of course, we should not do it there for additional subflows! And for the first subflow, I think that it will be done because when an accept() is triggered from the userspace, mptcp_stream_accept(msk, arg) will call inet_csk_accept(ssk, arg) that will end up doing that I think. Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next] mptcp: record subflows in RPS table 2025-08-14 11:20 ` Matthieu Baerts @ 2025-08-14 11:47 ` Matthieu Baerts 2025-08-26 4:10 ` Christoph Paasch 0 siblings, 1 reply; 8+ messages in thread From: Matthieu Baerts @ 2025-08-14 11:47 UTC (permalink / raw) To: Christoph Paasch; +Cc: MPTCP Upstream, Geliang Tang, Mat Martineau Hi Christoph, On 14/08/2025 13:20, Matthieu Baerts wrote: > Hi Christoph, > > On 14/08/2025 05:31, Christoph Paasch wrote: >> On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote: >>> >>> Hi Christoph, >>> >>> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote: >>>> From: Christoph Paasch <cpaasch@openai.com> >>>> >>>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording >>>> their RX flow hash into the rps_sock_flow_table so that incoming packets >>>> are steered to the CPU where the application runs. >>>> >>>> With MPTCP, the application interacts with the parent MPTCP socket while >>>> data is carried over per-subflow TCP sockets. Without recording these >>>> subflows, aRFS cannot steer interrupts and RX processing for the flows >>>> to the desired CPU. >>>> >>>> Record all subflows in the RPS table by calling sock_rps_record_flow() >>>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). >>>> >>>> It does not by itself improve throughput, but ensures that IRQ and RX >>>> processing are directed to the right CPU, which is a >>>> prerequisite for effective aRFS. >>> >>> Thank you for looking at that! >>> >>>> Signed-off-by: Christoph Paasch <cpaasch@openai.com> >>>> --- >>>> net/mptcp/protocol.c | 16 ++++++++++++++++ >>>> 1 file changed, 16 insertions(+) >>>> >>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 >>>> --- a/net/mptcp/protocol.c >>>> +++ b/net/mptcp/protocol.c >>>> @@ -12,6 +12,7 @@ >>>> #include <linux/sched/signal.h> >>>> #include <linux/atomic.h> >>>> #include <net/aligned_data.h> >>>> +#include <net/rps.h> >>>> #include <net/sock.h> >>>> #include <net/inet_common.h> >>>> #include <net/inet_hashtables.h> >>>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) >>>> return limit - not_sent; >>>> } >>>> >>>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) >>>> +{ >>>> + struct mptcp_subflow_context *subflow; >>>> + >>>> + mptcp_for_each_subflow(msk, subflow) { >>>> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); >>>> + >>>> + sock_rps_record_flow(ssk); >>> >>> Should we eventually open code this RPS function and directly call >>> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through >>> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed >>> before the for_each, then check the state for each subflow, no? >>> >>> Or maybe such optimisation is not worth it? >> >> IMO it's not worth it. Because it would also mean that we pull in more >> of non-MPTCP things over to MPTCP files. > > Good point for the maintenance! > > But then, what about adding a small helper in include/net/rps.h? Here we > could have this at the top of mptcp_rps_record_subflows(): > > if (!sock_rps_record_needed()) /* with likely()? */ > return; > > I'm suggesting this because rfs is disabled by default, so it sounds > better to check that before iterating over different subflows for > nothing, no? > >>> While at it, I see that sock_rps_record_flow() is called in >>> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk. >>> I guess that's OK, right? >> >> AFAICS, that should be alright. sk_rxhash should not be set on an msk. >> Even on the passive opener side, we clone msk from the ssk, but at >> that point sk_rxhash is not set yet. (I may be wrong of course, I'm >> still not super familiar with the source-code...) > > Thank you for having looked! I guess you would have seen issues when > testing if there was a problem with that, so no need to investigate more. > >>> Also, is sock_rps_record_flow() called when a subflow is "accept()ed", >>> or is there something else to do there for this subflow to copy TCP's >>> behaviour? >> >> For new subflows, I don't think we can easily do that. Because >> sock_rps_record_flow() records the CPU on which it is running. From >> what I see, when new subflows get accepted, that is not happening in >> the application's context necessarily. So, we have to wait for a >> recvmsg/sendmsg. > > Ah yes, of course, we should not do it there for additional subflows! > And for the first subflow, I think that it will be done because when an > accept() is triggered from the userspace, mptcp_stream_accept(msk, arg) > will call inet_csk_accept(ssk, arg) that will end up doing that I think. My bad, I think __inet_accept will not be called from inet_csk_accept(). For TCP, I think the path is: inet_accept() --> inet_csk_accept() --> __inet_accept() But with MPTCP, mptcp_stream_accept is called instead of inet_accept(), and I don't think __inet_accept() will be called on a subflow. (I didn't validate that.) Then, maybe we should call sock_rps_record_flow() on the newsk in mptcp_stream_accept, no? Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next] mptcp: record subflows in RPS table 2025-08-14 11:47 ` Matthieu Baerts @ 2025-08-26 4:10 ` Christoph Paasch 0 siblings, 0 replies; 8+ messages in thread From: Christoph Paasch @ 2025-08-26 4:10 UTC (permalink / raw) To: Matthieu Baerts; +Cc: MPTCP Upstream, Geliang Tang, Mat Martineau On Thu, Aug 14, 2025 at 4:47 AM Matthieu Baerts <matttbe@kernel.org> wrote: > > Hi Christoph, > > On 14/08/2025 13:20, Matthieu Baerts wrote: > > Hi Christoph, > > > > On 14/08/2025 05:31, Christoph Paasch wrote: > >> On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote: > >>> > >>> Hi Christoph, > >>> > >>> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote: > >>>> From: Christoph Paasch <cpaasch@openai.com> > >>>> > >>>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording > >>>> their RX flow hash into the rps_sock_flow_table so that incoming packets > >>>> are steered to the CPU where the application runs. > >>>> > >>>> With MPTCP, the application interacts with the parent MPTCP socket while > >>>> data is carried over per-subflow TCP sockets. Without recording these > >>>> subflows, aRFS cannot steer interrupts and RX processing for the flows > >>>> to the desired CPU. > >>>> > >>>> Record all subflows in the RPS table by calling sock_rps_record_flow() > >>>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg(). > >>>> > >>>> It does not by itself improve throughput, but ensures that IRQ and RX > >>>> processing are directed to the right CPU, which is a > >>>> prerequisite for effective aRFS. > >>> > >>> Thank you for looking at that! > >>> > >>>> Signed-off-by: Christoph Paasch <cpaasch@openai.com> > >>>> --- > >>>> net/mptcp/protocol.c | 16 ++++++++++++++++ > >>>> 1 file changed, 16 insertions(+) > >>>> > >>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > >>>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644 > >>>> --- a/net/mptcp/protocol.c > >>>> +++ b/net/mptcp/protocol.c > >>>> @@ -12,6 +12,7 @@ > >>>> #include <linux/sched/signal.h> > >>>> #include <linux/atomic.h> > >>>> #include <net/aligned_data.h> > >>>> +#include <net/rps.h> > >>>> #include <net/sock.h> > >>>> #include <net/inet_common.h> > >>>> #include <net/inet_hashtables.h> > >>>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk) > >>>> return limit - not_sent; > >>>> } > >>>> > >>>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk) > >>>> +{ > >>>> + struct mptcp_subflow_context *subflow; > >>>> + > >>>> + mptcp_for_each_subflow(msk, subflow) { > >>>> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); > >>>> + > >>>> + sock_rps_record_flow(ssk); > >>> > >>> Should we eventually open code this RPS function and directly call > >>> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through > >>> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed > >>> before the for_each, then check the state for each subflow, no? > >>> > >>> Or maybe such optimisation is not worth it? > >> > >> IMO it's not worth it. Because it would also mean that we pull in more > >> of non-MPTCP things over to MPTCP files. > > > > Good point for the maintenance! > > > > But then, what about adding a small helper in include/net/rps.h? Here we > > could have this at the top of mptcp_rps_record_subflows(): > > > > if (!sock_rps_record_needed()) /* with likely()? */ > > return; > > > > I'm suggesting this because rfs is disabled by default, so it sounds > > better to check that before iterating over different subflows for > > nothing, no? I see! Yes, that may be a good addition. I am doing that. > >>> While at it, I see that sock_rps_record_flow() is called in > >>> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk. > >>> I guess that's OK, right? > >> > >> AFAICS, that should be alright. sk_rxhash should not be set on an msk. > >> Even on the passive opener side, we clone msk from the ssk, but at > >> that point sk_rxhash is not set yet. (I may be wrong of course, I'm > >> still not super familiar with the source-code...) > > > > Thank you for having looked! I guess you would have seen issues when > > testing if there was a problem with that, so no need to investigate more. > > > >>> Also, is sock_rps_record_flow() called when a subflow is "accept()ed", > >>> or is there something else to do there for this subflow to copy TCP's > >>> behaviour? > >> > >> For new subflows, I don't think we can easily do that. Because > >> sock_rps_record_flow() records the CPU on which it is running. From > >> what I see, when new subflows get accepted, that is not happening in > >> the application's context necessarily. So, we have to wait for a > >> recvmsg/sendmsg. > > > > Ah yes, of course, we should not do it there for additional subflows! > > And for the first subflow, I think that it will be done because when an > > accept() is triggered from the userspace, mptcp_stream_accept(msk, arg) > > will call inet_csk_accept(ssk, arg) that will end up doing that I think. > > My bad, I think __inet_accept will not be called from inet_csk_accept(). > For TCP, I think the path is: > > inet_accept() > --> inet_csk_accept() > --> __inet_accept() > > But with MPTCP, mptcp_stream_accept is called instead of inet_accept(), > and I don't think __inet_accept() will be called on a subflow. (I didn't > validate that.) Then, maybe we should call sock_rps_record_flow() on the > newsk in mptcp_stream_accept, no? Indeed! I will iterate over the subflows there as well and take care of them. Christoph ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-26 4:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-07 18:14 [PATCH mptcp-next] mptcp: record subflows in RPS table Christoph Paasch 2025-08-07 18:14 ` Christoph Paasch via B4 Relay 2025-08-07 19:46 ` MPTCP CI 2025-08-13 14:53 ` Matthieu Baerts 2025-08-14 3:31 ` Christoph Paasch 2025-08-14 11:20 ` Matthieu Baerts 2025-08-14 11:47 ` Matthieu Baerts 2025-08-26 4:10 ` Christoph Paasch
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.