* [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window
@ 2020-03-18 14:19 Florian Westphal
2020-03-18 17:59 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2020-03-18 14:19 UTC (permalink / raw)
To: netdev; +Cc: mptcp, Florian Westphal, Eric Dumazet
In MPTCP, the receive windo is shared across all subflows, because it
refers to the mptcp-level sequence space.
This commit doesn't change choice of initial window for passive or active
connections.
While it would be possible to change those as well, this adds complexity
(especially when handling MP_JOIN requests).
However, the MPTCP RFC specifically says that a MPTCP sender
'MUST NOT use the RCV.WND field of a TCP segment at the connection level if
it does not also carry a DSS option with a Data ACK field.'
SYN/SYNACK packets do not carry a DSS option with a Data ACK field.
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
This patch would add additional direct call in __tcp_select_window().
I looked at mptcp option writing to check if it could be done there but
that seemed worse.
include/net/mptcp.h | 3 +++
net/ipv4/tcp_output.c | 5 +++++
net/mptcp/subflow.c | 17 +++++++++++++++++
3 files changed, 25 insertions(+)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 7489f9267640..1ef4520f45c3 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -66,6 +66,8 @@ static inline bool rsk_is_mptcp(const struct request_sock *req)
return tcp_rsk(req)->is_mptcp;
}
+void mptcp_space(const struct sock *ssk, int *space, int *full_space);
+
void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr,
int opsize, struct tcp_options_received *opt_rx);
bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
@@ -195,6 +197,7 @@ static inline bool mptcp_sk_is_subflow(const struct sock *sk)
return false;
}
+static inline void mptcp_space(const struct sock *ssk, int *s, int *fs) { }
static inline void mptcp_seq_show(struct seq_file *seq) { }
#endif /* CONFIG_MPTCP */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 306e25d743e8..1a829536a115 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2771,6 +2771,11 @@ u32 __tcp_select_window(struct sock *sk)
int full_space = min_t(int, tp->window_clamp, allowed_space);
int window;
+ if (sk_is_mptcp(sk)) {
+ mptcp_space(sk, &free_space, &allowed_space);
+ full_space = min_t(int, tp->window_clamp, allowed_space);
+ }
+
if (unlikely(mss > full_space)) {
mss = full_space;
if (mss <= 0)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 40ad7995b13b..aefcbb8bb737 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -745,6 +745,23 @@ bool mptcp_subflow_data_available(struct sock *sk)
return subflow->data_avail;
}
+/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
+ * not the ssk one.
+ *
+ * In mptcp, rwin is about the mptcp-level connection data.
+ *
+ * Data that is still on the ssk rx queue can thus be ignored,
+ * as far as mptcp peer is concerened that data is still inflight.
+ */
+void mptcp_space(const struct sock *ssk, int *space, int *full_space)
+{
+ const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+ const struct sock *sk = READ_ONCE(subflow->conn);
+
+ *space = tcp_space(sk);
+ *full_space = tcp_full_space(sk);
+}
+
static void subflow_data_ready(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
--
2.24.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window
2020-03-18 14:19 [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window Florian Westphal
@ 2020-03-18 17:59 ` Eric Dumazet
2020-03-18 18:05 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2020-03-18 17:59 UTC (permalink / raw)
To: Florian Westphal, netdev; +Cc: mptcp, Eric Dumazet
On 3/18/20 7:19 AM, Florian Westphal wrote:
> In MPTCP, the receive windo is shared across all subflows, because it
> refers to the mptcp-level sequence space.
>
> This commit doesn't change choice of initial window for passive or active
> connections.
> While it would be possible to change those as well, this adds complexity
> (especially when handling MP_JOIN requests).
>
> However, the MPTCP RFC specifically says that a MPTCP sender
> 'MUST NOT use the RCV.WND field of a TCP segment at the connection level if
> it does not also carry a DSS option with a Data ACK field.'
>
> SYN/SYNACK packets do not carry a DSS option with a Data ACK field.
>
> CC: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> This patch would add additional direct call in __tcp_select_window().
>
> I looked at mptcp option writing to check if it could be done there but
> that seemed worse.
>
> include/net/mptcp.h | 3 +++
> net/ipv4/tcp_output.c | 5 +++++
> net/mptcp/subflow.c | 17 +++++++++++++++++
> 3 files changed, 25 insertions(+)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 7489f9267640..1ef4520f45c3 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -66,6 +66,8 @@ static inline bool rsk_is_mptcp(const struct request_sock *req)
> return tcp_rsk(req)->is_mptcp;
> }
>
> +void mptcp_space(const struct sock *ssk, int *space, int *full_space);
> +
> void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr,
> int opsize, struct tcp_options_received *opt_rx);
> bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
> @@ -195,6 +197,7 @@ static inline bool mptcp_sk_is_subflow(const struct sock *sk)
> return false;
> }
>
> +static inline void mptcp_space(const struct sock *ssk, int *s, int *fs) { }
> static inline void mptcp_seq_show(struct seq_file *seq) { }
> #endif /* CONFIG_MPTCP */
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 306e25d743e8..1a829536a115 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2771,6 +2771,11 @@ u32 __tcp_select_window(struct sock *sk)
> int full_space = min_t(int, tp->window_clamp, allowed_space);
> int window;
>
> + if (sk_is_mptcp(sk)) {
> + mptcp_space(sk, &free_space, &allowed_space);
> + full_space = min_t(int, tp->window_clamp, allowed_space);
> + }
You could move the full_space = min_t(int, tp->window_clamp, allowed_space);
after this block factorize it.
> +
> if (unlikely(mss > full_space)) {
> mss = full_space;
> if (mss <= 0)
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 40ad7995b13b..aefcbb8bb737 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -745,6 +745,23 @@ bool mptcp_subflow_data_available(struct sock *sk)
> return subflow->data_avail;
> }
>
> +/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
> + * not the ssk one.
> + *
> + * In mptcp, rwin is about the mptcp-level connection data.
> + *
> + * Data that is still on the ssk rx queue can thus be ignored,
> + * as far as mptcp peer is concerened that data is still inflight.
> + */
> +void mptcp_space(const struct sock *ssk, int *space, int *full_space)
> +{
> + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> + const struct sock *sk = READ_ONCE(subflow->conn);
What are the rules protecting subflow->conn lifetime ?
Why dereferencing sk after this line is safe ?
> +
> + *space = tcp_space(sk);
> + *full_space = tcp_full_space(sk);
> +}
> +
> static void subflow_data_ready(struct sock *sk)
> {
> struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window
2020-03-18 17:59 ` Eric Dumazet
@ 2020-03-18 18:05 ` Florian Westphal
2020-03-18 18:14 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2020-03-18 18:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, netdev, mptcp, Eric Dumazet
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > @@ -2771,6 +2771,11 @@ u32 __tcp_select_window(struct sock *sk)
> > int full_space = min_t(int, tp->window_clamp, allowed_space);
> > int window;
> >
> > + if (sk_is_mptcp(sk)) {
> > + mptcp_space(sk, &free_space, &allowed_space);
> > + full_space = min_t(int, tp->window_clamp, allowed_space);
> > + }
>
> You could move the full_space = min_t(int, tp->window_clamp, allowed_space);
> after this block factorize it.
Indeed, will do.
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index 40ad7995b13b..aefcbb8bb737 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -745,6 +745,23 @@ bool mptcp_subflow_data_available(struct sock *sk)
> > return subflow->data_avail;
> > }
> >
> > +/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
> > + * not the ssk one.
> > + *
> > + * In mptcp, rwin is about the mptcp-level connection data.
> > + *
> > + * Data that is still on the ssk rx queue can thus be ignored,
> > + * as far as mptcp peer is concerened that data is still inflight.
> > + */
> > +void mptcp_space(const struct sock *ssk, int *space, int *full_space)
> > +{
> > + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> > + const struct sock *sk = READ_ONCE(subflow->conn);
>
> What are the rules protecting subflow->conn lifetime ?
>
> Why dereferencing sk after this line is safe ?
Subflow sockets hold a reference on the master/parent mptcp-socket.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window
2020-03-18 18:05 ` Florian Westphal
@ 2020-03-18 18:14 ` Eric Dumazet
2020-03-18 20:59 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2020-03-18 18:14 UTC (permalink / raw)
To: Florian Westphal, Eric Dumazet; +Cc: netdev, mptcp, Eric Dumazet
On 3/18/20 11:05 AM, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> @@ -2771,6 +2771,11 @@ u32 __tcp_select_window(struct sock *sk)
>>> int full_space = min_t(int, tp->window_clamp, allowed_space);
>>> int window;
>>>
>>> + if (sk_is_mptcp(sk)) {
>>> + mptcp_space(sk, &free_space, &allowed_space);
>>> + full_space = min_t(int, tp->window_clamp, allowed_space);
>>> + }
>>
>> You could move the full_space = min_t(int, tp->window_clamp, allowed_space);
>> after this block factorize it.
>
> Indeed, will do.
>
>>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>>> index 40ad7995b13b..aefcbb8bb737 100644
>>> --- a/net/mptcp/subflow.c
>>> +++ b/net/mptcp/subflow.c
>>> @@ -745,6 +745,23 @@ bool mptcp_subflow_data_available(struct sock *sk)
>>> return subflow->data_avail;
>>> }
>>>
>>> +/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
>>> + * not the ssk one.
>>> + *
>>> + * In mptcp, rwin is about the mptcp-level connection data.
>>> + *
>>> + * Data that is still on the ssk rx queue can thus be ignored,
>>> + * as far as mptcp peer is concerened that data is still inflight.
>>> + */
>>> +void mptcp_space(const struct sock *ssk, int *space, int *full_space)
>>> +{
>>> + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>>> + const struct sock *sk = READ_ONCE(subflow->conn);
>>
>> What are the rules protecting subflow->conn lifetime ?
>>
>> Why dereferencing sk after this line is safe ?
>
> Subflow sockets hold a reference on the master/parent mptcp-socket.
>
Presence of READ_ONCE() tells something might happen on
this pointer after you read it.
Can this pointer be set while this thread is owning the socket lock ?
If not, then you do not need READ_ONCE(), this is confusing.
If yes, then it means that whatever changes the pointer might also release the reference
on the old object.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window
2020-03-18 18:14 ` Eric Dumazet
@ 2020-03-18 20:59 ` Florian Westphal
0 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2020-03-18 20:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, netdev, mptcp, Eric Dumazet
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>> +/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
> >>> + * not the ssk one.
> >>> + *
> >>> + * In mptcp, rwin is about the mptcp-level connection data.
> >>> + *
> >>> + * Data that is still on the ssk rx queue can thus be ignored,
> >>> + * as far as mptcp peer is concerened that data is still inflight.
> >>> + */
> >>> +void mptcp_space(const struct sock *ssk, int *space, int *full_space)
> >>> +{
> >>> + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> >>> + const struct sock *sk = READ_ONCE(subflow->conn);
> >>
> >> What are the rules protecting subflow->conn lifetime ?
> >>
> >> Why dereferencing sk after this line is safe ?
> >
> > Subflow sockets hold a reference on the master/parent mptcp-socket.
> >
>
> Presence of READ_ONCE() tells something might happen on
> this pointer after you read it.
Right, sorry about this. The READ_ONCE() isn't needed anymore after
recent improvement from Paolo.
> Can this pointer be set while this thread is owning the socket lock ?
Only by the one holding the sk lock, so no race.
> If not, then you do not need READ_ONCE(), this is confusing.
Yes.
> If yes, then it means that whatever changes the pointer might also release the reference
> on the old object.
The reference is released only after aquiring the socket lock.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-18 20:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-18 14:19 [RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window Florian Westphal
2020-03-18 17:59 ` Eric Dumazet
2020-03-18 18:05 ` Florian Westphal
2020-03-18 18:14 ` Eric Dumazet
2020-03-18 20:59 ` Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).